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1.  INTRODUCTION 

In  recent  years  concurrency  has  become  ubiquitous  in  a  wide  range  of  software  sys¬ 
tems,  from  high  performance  computers  to  ordinary  laptops,  smart  phones  and  even 
embedded  systems.  The  concurrency  models  used  by  applications  running  on  these 
systems  differ  widely,  including  parallel  number  crunching,  task  synchronization,  and 
inter-thread  communication  for  hiding  I/O  latency,  among  many  others. 

The  problem  of  concurrency  cannot  be  successfully  solved  without  considering  soft¬ 
ware  engineering  concerns.  Today  most  software  leverages  libraries,  frameworks  and 
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other  reusable  software  components,  and  is  large  enough  to  be  difficult  for  a  single  pro¬ 
grammer  to  fully  understand.  This  often  leads  to  cases  where  a  small  change  in  one 
component  breaks  a  completely  unrelated  component.  In  addition  to  those  correctness 
concerns  comes  the  question  of  efficiency.  [Adve  and  Boehm  2010]  show  that  correct 
and  efficient  concurrency  support  requires  programming  language  support.  In  partic¬ 
ular  it  is  shown  that  race  freedom,  at  the  very  least,  must  be  supported  in  programming 
languages  to  allow  efficient  cooperation  between  hardware  and  software. 

In  this  paper  we  present  /EMINIUM  [Stork  2013],  which  is  to  our  knowledge  the  first 
system  to  combine  automatic  parallelization  with  type-based  safe  deterministic  and 
non-deterministic  concurrency.  The  /EMINIUM  type  system  is  based  on  access  permis¬ 
sions,  which  express  constraints  on  program  aliasing,  allowing  us  to  overcome  one  of 
the  major  obstacles  in  prior  automatic  parallelization  work.  This  aliasing  information 
allows  the  compiler  to  easily  build  a  dependency  graph  and  then  to  parallelize  the  code. 
A  novel  permission  splitting  operation  allows  programmers  to  express  when  two  opera¬ 
tions  that  access  the  same  data  are  conceptually  independent,  allowing  the  compiler  to 
safely  extract  nondeterministic  concurrency  in  addition  to  deterministic  parallelism. 

Our  approach  permits  the  user  to  expose  potential  parallelism  in  a  predictable  way 
through  permissions,  but  puts  the  runtime  system  in  charge  of  the  highly  platform- 
dependent  task  of  scheduling  that  potential  parallelism  onto  hardware  resources.  Li¬ 
brary  code  can  also  be  more  reusable,  as  the  programmer  only  exposes  potential  paral¬ 
lelism  with  permissions,  rather  than  committing  to  a  particular  parallelization  strat¬ 
egy  which  may  conflict  with  client  code. 

The  main  contributions  of  this  paper  are: 

—  A  concurrent-by-default  programming  language  that  leverages  permissions  and 
data  groups  to  automatically,  safely,  and  deterministically  parallelize  applications 
based  on  permission  flows.  While  an  initial  sketch  of  the  approach  was  presented 
in  [Stork  et  al.  2009],  this  paper  fills  in  the  sketch  to  show  how  the  system  actually 
works,  and  provides  a  different  (and  more  workable)  design  for  data  group  permis¬ 
sions. 

—  A  safe  approach  to  integrating  nondeterminism  into  the  implicit  parallelism  model 
above.  Our  approach  leverages  access  permissions  to  data  groups,  allowing  devel¬ 
opers  to  explicitly  specify  when  nondeterminism  is  permisible,  while  ensuring  the 
absence  of  data  races. 

—  A  core  calculus  called  /i^EMINIUM  which  makes  the  model  above  precise  and  allows 
formal  reasoning  about  the  system.  The  formal  system  consists  of: 

—  a  type  system  that  extracts  dependency  information  and  ensures  the  absence  of 
race  conditions; 

—  a  concurrent-by-default  evaluation  semantics,  which  models  dataflow  paral¬ 
lelism  at  a  fine  granularity,  in  contrast  to  prior  type-based  concurrency  models 
that  used  threads  or  explicit  fork-join  parallelism;  and 

—  a  proof  of  type  soundness  and  race  freedom. 

—  A  detailed  description  of  our  prototype  implementation  in  the  Plaid  programming 
language  infrastructure. 

—  Several  case  studies  to  evaluate  our  initial  implementation  which  show  the  benefits 
and  applicability  of  our  system  to  selected  example  programs. 

1.1.  Approach 

In  /Eminium  the  programmer  uses  permissions  to  specify  which  data  he  is  accessing 
and  in  which  way  he  needs  to  access  the  data  (e.g.,  if  he  is  willing  to  share  access  to 
the  data  with  other  parts  of  the  code  or  if  he  wants  exclusive  access).  Encoding  this 
permission  information  allows  the  system  to  check  for  the  correctness  of  each  function 
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as  well  as  their  composition  in  a  modular  way.  Based  on  the  permission  flow  through 
the  application  /EMINIUM  infers  potential  concurrent  executions  by  computing  a  data 
flow  graph  [Rumbaugh  1975]  which  can  then  be  executed  by  exploiting  available,  and 
potentially  concurrent,  computation  resources.  /EMINIUM’s  type  system  prevents  data 
races  by  either  enforcing  synchronization  when  accessing  shared  data  or  by  correctly 
computing  dependencies  to  ensure  a  happens-before  relationship  (meaning  conflicting 
accesses  will  be  ordered  according  to  their  lexical  order). 

Note:  ^MINIUM  is  implemented  in  Plaid  [Aldrich  et  al.  2009]  which  already  has  first 
class  support  for  permissions.  We  therefore  present  all  examples  in  ^MINIUM/Plaid 
syntax.  Plaid’s  syntax  is  sufficiently  close  to  Java’s  syntax  to  be  readily  understood.  We 
ignore  Plaid’s  special  features  (such  as  typestate)  and  for  the  purposes  of  this  paper, 
we  consider  Plaid’s  states  to  be  equivalent  to  Java’s  classes. 

To  illustrate  these  concepts,  consider  the  transfer  function  shown  below,  which 
transfers  a  specific  amount  between  two  bank  accounts.  It  first  withdraws  the  spec¬ 
ified  amount  of  money  from  the  ‘from’  account  and  then  deposits  the  same  amount  into 
the  To’  account. 

( - ^ 

method  void  transfer(unique  Account  from, 

unique  Account  to, 
immutable  Amount  amount)  { 
withdraw(from,  amount) 
deposit(to,  amount); 

} 

V _ ) 

For  this  example  we  assume  that  the  order  in  which  we  perform  the  withdraw  and 
deposit  operations  does  not  matter.  In  particular,  they  could  be  executed  concurrently 
because  both  the  withdraw  and  deposit  operations  should  only  affect  the  specified  bank 
account  and  no  other.  To  encode  this  extra  information  /EMINIUM  uses  permission  an¬ 
notations.  Permissions  [Boyland  2003]  specify  aliasing  and  access  information  for  ob¬ 
jects.  The  transfer  method  specifies  that  it  requires  a  unique  permission  to  both  bank 
accounts  and  a  immutable  permission  to  the  amount  parameter.  The  unique  permis¬ 
sion  means  that  there  is  only  one  valid  reference  to  the  specified  object  in  the  whole 
system  at  the  moment  of  a  function  call,  and  modifications  to  the  object  within  the 
function  are  possible.  The  immutable  permission  specifies  that  there  might  be  multi¬ 
ple  aliases  to  this  object  but  none  of  them  can  be  used  to  change  the  object. 

Assuming  the  method  declarations  for  the  deposit  and  withdraw  methods  given 
below,  /EMINIUM  is  now  able  to  compute  the  permission  flow  within  the  transfer 
method.  The  unique  permission  of  the  ‘to’  parameter  flows  to  the  deposit  method  while 
the  unique  permission  of  the  'from’  parameter  flows  to  the  withdraw.  But  we  only  have 
one  immutable  permission  to  the  ‘amount’  object  while  both  withdraw  and  deposit  re¬ 
quire  one  each.  Because  immutable  permissions  explicitly  allow  aliasing  ^MINIUM 
automatically  splits  the  one  immutable  permission  into  two  permissions,  which  are 
then  passed  to  the  two  method  calls. 

r - \ 

method  void  withdraw(unique  Account  account, 

immutable  Amount  amount)  {...} 

method  void  deposit(unique  Account  account, 

immutable  Amount  amount)  {...} 

\ _ _ _ 

The  permission  flow  of  the  transfer  method  is  shown  in  Figure  1.  After  the  split 
operation  the  unique  ‘to’  and  immutable  ‘amount’  permissions  are  passed  to  deposit 
method  while  the  unique  ’from’  permission  and  immutable  ‘amount’  permission  flow  to 
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Fig.  1:  Permission  Flow  in  the  Transfer  Example.  We  use  the  notation  var  :  perm  to  indicate 
that  we  have  permission  ‘perm’  for  variable  Var’. 


the  withdraw  method.  After  those  methods  complete  ^MINIUM  will  automatically  join 
the  previously  split  immutable  permissions.  The  permission  flow  graph  corresponds  to 
the  data  flow  graph  which  is  used  to  execute  the  transfer  methods.  Although  this  ex¬ 
ample  illustrates  only  unique  and  immutable  data,  we  will  later  show  how  /EMINIUM 
supports  shared  mutable  data  with  shared  permissions  and  an  atomic  synchronization 
primitive. 

Note  that  in  this  example,  passing  a  unique  Account  object  to  be  modified  by  a 
method  is  isomorphic  to  passing  an  immutable  Account  object  as  an  argument  and 
receiving  an  updated  Account  as  the  result  of  the  method.  One  can  thus  think  of  state 
being  threaded  through  the  program  following  the  permissions.  In  this  sense,  permis¬ 
sions  allow  us  to  treat  an  imperative  program  as  if  it  were  purely  functional,  with 
corresponding  benefits  for  reasoning  and  parallelization.  An  analogy  can  be  made  to 
monads  [Moggi  1991]  such  as  the  state  monad  in  Haskell,  which  conceptually  threads 
the  state  of  the  heap  through  the  program  computation.  However,  embedding  permis¬ 
sions  in  a  linear  logic  and  providing  splitting  rules,  as  discused  below,  adds  flexibility 
compared  to  a  monadic  approach.  While  we  do  not  explore  the  monad  analogy  further 
in  the  paper,  we  believe  some  readers  may  find  it  helpful. 

In  the  following  example  we  explore  a  hypothetical  mistake,  in  which  the  program¬ 
mer  tries  to  implement  the  avail  able  .balance  method  to  compute  the  available  bal¬ 
ance  of  a  given  account.  For  this  the  caller  must  pass  in  the  account  object  along 
with  an  immutable  permission.  Due  to  a  mistake  the  programmer  adds  a  call  to  the 
withdraw  method,  which  attempts  to  withdraw  the  specified  amount  from  the  given 
account.  The  withdraw  method,  though,  requires  a  unique  permission  to  the  account 
and  we  only  have  an  immutable  permission  to  the  specified  account.  This  will  result  in 
a  typechecking  error  because  an  immutable  permission  cannot  be  converted  into  the 
required  unique  permission — and  fortunately  so,  because  an  immutable  object  can  be 
accessed  in  parallel,  so  allowing  a  modifying  access  could  result  in  a  race  condition. 
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method  immutable  Amount  available_balance(immutable  Account  account)  { 

II ... 

withdraw( account,  amount);  //  typecheck  error 

II ... 

} 

V _ ) 


1.2.  Outline 

The  paper  is  organized  as  follows:  Section  2  provides  an  overview  of  the  concept  of  the 
/EMINIUM  language;  Section  3  gives  a  detailed  description  of  the  core  calculus;  Section 
4  presents  an  overview  of  our  initial  prototype  implementation  and  Section  5  presents 
its  evaluation;  Section  6  discusses  the  current  limitations  of  our  prototype  system  and 
future  work;  Section  7  compares  our  approach  with  previous  approaches  and,  finally, 
Section  8  concludes  the  paper. 

2.  OVERVIEW 

In  this  section  we  describe  the  /EMINIUM  programming  language,  which  realizes  a 
concurrent-by-default  programming  model  [Stork  et  al.  2009]  with  a  concrete  design 
and  precise  semantics.  /EMINIUM  uses  access  permissions  [Beckman  et  al.  2008]  for 
objects  and  data  group  permissions  for  data  groups  [Leino  1998]  to  compute  the  per¬ 
mission  flow  throughout  the  code  (explained  in  the  next  sub-sections).  The  compiler 
uses  this  information  to  compute  a  dataflow  graph,  which  can  then  be  executed  in 
parallel  on  available  computing  resources. 

While  the  general  ^MINIUM  approach  is  language  agnostic,  we  use  an  extended 
Java  syntax  for  presenting  the  examples  in  this  section.  This  requires  extending  the 
Java  syntax  with  the  missing  language  constructs  and  permission  annotations.  We  are 
currently  working  on  a  prototype  implementation  in  the  Plaid  [Aldrich  et  al.  2009]  lan¬ 
guage.  Plaid  has  permissions  built-in  as  an  first  class  language  construct  and  therefore 
requires  only  minor  extensions  to  support  /EMINIUM. 

2.1.  Access  Permissions 

Access  Permissions  (AP)  have  been  studied  in  the  past  for  checking  interface  proto¬ 
col  compliance  and  verifying  the  correct  use  of  synchronization  [Beckman  et  al.  2008]. 
In  ^MINIUM  we  use  access  permissions,  and  more  precisely  the  flow  of  the  access 
permissions  through  the  application,  to  model  possible  concurrent  execution  strate¬ 
gies  for  a  program.  Access  permissions  are  abstract  capabilities  associated  with  object 
references.  The  primary  purpose  of  access  permissions  is  to  keep  track  of  how  many 
references  to  a  given  object  exist  in  a  moment  in  time,  and  to  specify  what  kind  of 
operations  are  permitted  on  the  object  at  that  moment.  In  iEMINIUM  we  adopted  the 
following  three  permissions  kinds: 

unique.  A  unique  access  permission  to  an  object  reference  indicates  that  there  is 
exactly  one  reference  (the  current  reference  to  that  object)  at  this  moment  in  time. 
A  unique  access  permission  allows  clients  to  read  and  modify  the  object, 
shared.  A  shared  access  permission  to  an  object  reference  indicates  that  there  are 
an  arbitrary  number  of  references  to  the  object  in  the  system  and  all  the  permissions 
are  shared.  A  shared  access  permission  allows  the  client  to  read  and  modify  the 
object. 

immutable.  An  immutable  access  permission  to  an  object  reference  indicates  that 
there  are  an  arbitrary  number  of  references  to  the  object  in  the  system  and  all  of 
them  are  immutable.  An  immutable  access  permission  allows  only  read  access  to 
the  object. 
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Access  permissions  follow  the  rules  of  linear  logic  [Girard  1987].  They  are  analogous 
to  physical  resources  that  are  unavailable  once  consumed.  Permissions  can  be  con¬ 
verted  from  one  type  to  another  as  long  as  the  previously  described  invariants  hold. 
For  instance,  a  unique  AP  can  be  split  into  two  shared  APs.  Because  of  the  linearity  of 
APs  the  unique  AP  is  gone,  having  been  replaced  by  two  shared  APs.  Each  of  the  shared 
APs  can  be  further  split  into  more  shared  APs,  but  not  into  unique  or  immutable  per¬ 
missions.  Using  fractions  [Boyland  2003]  for  keeping  track  of  the  individual  AP  allows 
permissions  to  be  joined ,  eventually  enabling  the  recovery  of  a  unique  access  permis¬ 
sion. 

The  type  system  computes  the  AP  flow  in  the  program  and  automatically  splits/joins 
APs  as  needed.  In  ZEMINIUM  two  expressions  may  execute  concurrently  if  their  per¬ 
missions  do  not  interfere:  that  is,  they  have  a  disjoint  set  of  unique  permissions  or  an 
arbitrary  set  of  overlapping  shared  and  immutable  permissions.  To  avoid  data  races 
/EMINIUM  only  allows  access  to  shared  data  within  atomic  blocks.  The  AP  flow  obeys 
the  lexical  order  of  statements,  meaning  that  if  two  pieces  of  code  need  the  same  unique 
AP,  the  unique  AP  will  first  flow  to  the  first  expression  and  then  to  the  second  one. 

2.2.  Data  Groups 

Although  pure  APs  define  a  clean  execution  model  for  unique  and  immutable  data, 
our  permission  splitting  rules  will  allow  all  operations  on  shared  data  to  proceed  con¬ 
currently.  We  need  a  way  to  express  when  one  operation  on  a  shared  data  structure 
depends  on  another.  Furthermore,  we’d  like  to  control  these  dependencies,  as  well  as 
sychronization  on  shared  data,  at  a  granularity  greater  than  one  object  at  a  time. 

To  address  this  challenge  we  leverage  data  groups  (DG,  [Leino  1998]).  A  data  group 
represents  an  abstract  collection  of  objects.  Using  data  groups  for  grouping  multiple 
objects  differs  from  previous  work  [Leino  et  al.  2002],  which  used  data  groups  exclu¬ 
sively  to  partition  the  state  of  one  object.  When  an  object  is  part  of  a  data  group,  we 
say  that  this  object  is  owned  by  that  data  group.  In  ZEMINIUM  each  shared  object 
must  be  part  of  exactly  one  data  group.  The  specific  data  group  an  object  is  in  can 
change  during  runtime  execution.  To  transfer  a  shared  object  from  one  data  group  to 
another  one,  all  shared  permissions  to  the  object  must  be  joined  into  a  unique  permis¬ 
sion.  Only  when  a  unique  permission  has  been  reassembled  is  it  possible  to  split  this 
unique  permission  into  shared  permissions  associated  with  a  different  data  group.  We 
write  shared  (my  Group)  to  indicate  that  the  shared  object  is  part  of  the  data  group 
my  Group.  Data  groups  need  to  be  declared  in  a  state  but  are  instance  specific  (like 
instance-specific  fields).  When  an  object  is  allocated,  the  data  groups  associated  with 
it  are  instantiated  by  the  compiler/runtime  system.  The  global  set  of  data  groups  par¬ 
titions  the  heap  of  shared  objects  into  disjoint  parts,  which  do  not  overlap. 

Additionally,  we  adapt  the  concept  of  access  permissions  to  data  groups  and  call 
them  data  group  permissions  (GP).  /EMINIUM  currently  defines  the  following  data 
group  permissions: 

exclusive.  There  is  at  most  one  exclusive  GP  to  a  data  group  in  the  whole  system 
at  a  time.  This  resembles  a  unique  AP.  Similar  to  a  unique  permission,  an  exclusive 
GP  represents  the  only  currently  existing  permission  through  which  the  data  of  the 
data  group  can  be  accessed. 

An  exclusive  group  permission  behaves  like  “thread-local”  data  (although  we  do  not 
have  the  notions  of  threads  in  /EMINIUM).  An  execution  path  that  holds  an  exclu¬ 
sive  group  permission  can  safely  access  the  associated  shared  objects  of  the  group 
without  synchronization.  This  is  an  important  feature  as  many  data  structures  in¬ 
trinsically  require  shared  access  permissions  to  the  objects  they  are  composed  of 
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Fig.  2:  Permissions  in  7EMINIUM.  Shows  different  permission  kinds  and  what  each  permission 
controls  (including  arity).  Access  permissions  control  access  to  objects  and  group  permissions 
control  access  to  data  groups  of  shared  objects.  There  can  only  exist  one  unique,  exclusive  or 
protected  permission  to  an  object  or  data  group  at  a  time  in  the  system,  while  there  can  be  an 
arbitrary  number  of  shared  and  immutable  permissions.  Shared  permissions  refer  to  the  data 
group  to  which  they  belong  to  (e.g.,  shared(a)  means  the  object  belongs  to  data  group  a). 

(e.g.,  a  doubly  linked  list  which  requires  at  least  two  valid  references  to  the  linked 
node  objects). 

shared.  A  shared  GP  resembles  a  shared  AP:  there  can  be  an  arbitrary  number  of 
shared  GP  in  the  system.  Having  a  shared  GP  does  not  grant  any  kind  of  access  to 
the  associated  data  because  there  is  the  danger  of  data  races. 

protected.  A  protected  GP  indicates  that  access  to  the  shared  data  is  safe  because 
the  access  to  the  shared  data  group  has  been  protected  by  a  corresponding  atomic 
block.  The  semantics  of  protected  permissions  is  that  there  can  only  be  one  protected 
permission  per  data  group  at  a  time.  This  is  enforced  by  the  runtime  system.  In  con¬ 
trast  to  an  exclusive  permission,  a  protected  permission  cannot  be  split  into  shared 
permissions;  doing  so  would  be  tantamount  to  requesting  concurrency  within  an 
atomic  block,  likely  with  confusing  and  even  error-prone  semantics. 

Figure  2  provides  an  global  overview  of  all  available  permissions  in  the  7EMINIUM 
system.  Access  permissions  are  used  to  classify  object  references  and  consist  of  unique , 
shared  and  immutable .  By  definition  every  shared  object  must  be  associated  with  a 
data  group  (e.g.,  a)  for  which  we  use  a  data  group  permission  exclusive,  shared  and 
protected. 

2.2.1.  Management  of  Data  Group  Permissions.  Unlike  the  automatic  splitting  of  access 
permissions,  data  group  permissions  are  split  and  joined  manually  to  provide  the  pro¬ 
grammer  with  better  control  over  dependencies  between  operations.  By  default,  each 
operation  on  a  data  group  depends  on  the  previous  operation  on  that  data  group;  when 
the  operations  are  conceptually  independent,  an  explicit  split  block  is  used  to  split  an 
exclusive  GP  into  an  arbitrarily  number  of  shared  GPs  (see  Figure  3).  The  split  block 
specifies  data  groups  for  which  it  splits  the  available  permission  (either  exclusive  or 
shared)  into  more  shared  permissions  (one  for  each  statement  in  the  body).  Group  per¬ 
missions  to  data  groups  not  mentioned  are  simply  passed  into  its  body.  The  available 
permissions  inside  the  body  are  partitioned  into  disjoint  sets.  Each  one  of  those  permis¬ 
sion  subsets  flows  to  one  statement  of  the  body.  This  means  that  if  multiple  statements 
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in  the  block  require  the  same  unique  AP,  or  any  GP  that  is  not  mentioned  in  the  split 
block,  then  the  code  will  not  typecheck  because  permissions  cannot  be  duplicated.  Af¬ 
ter  the  completion  of  all  body  statements,  the  split  block  joins  the  generated  shared 
permissions  back  to  the  permission  that  existed  before  the  block  was  entered. 

In  order  to  give  programmers  control  over  the  granularity  of  synchronization,  each 
atomic  block  protects  access  to  objects  in  the  particular  data  groups  that  are  specified 
at  the  atomic  block  entry  point.  It  will  provide  a  protected  GP  for  the  specified  data 
group  to  its  body  expression.  The  specification  of  the  data  group  is  optional  as  the  com¬ 
piler  can  automatically  infer  the  required  data  groups  from  the  provided  arguments 
at  the  call  site.  This  is  similar  to  C++  which  can  deduce  template  parameter  type 
from  the  provided  arguments.  In  ^MINIUM’s  case  the  type  of  the  arguments  encodes 
which  data  groups  the  shared  objects  are  associated  with  and  the  compiler  can  use 
this  information  to  deduce  the  required  data  group  parameter  information.  Providing 
an  explicit  annotation,  however,  provides  useful  documentation  of  the  programmer’s 
intent  and  helps  catch  unintended  data  accesses.  In  particular,  the  semantics  of  the 
atomic  block  is  that  its  body  is  executed  as  if  it  has  exclusive  access  to  the  shared  data 
associated  with  the  specified  data  group.  Similar  to  the  split  block,  the  atomic  block 
will  upon  its  completion  revert  the  GP  to  the  state  it  was  in  before  entering  the  atomic 
block.  The  semantics  of  split  and  atomic  blocks  is  illustrated  by  example  in  Figure  3. 

Data  groups  are  declared  inside  states  in  a  similar  way  to  fields  (see  Figure  4,  line 
6).  Data  groups  are  only  visible  inside  states  and  their  sub-states  (similar  to  Java’s 
protected).  Before  accessing  data  associated  with  those  inner  groups,  the  program¬ 
mer  must  gain  access  to  those  data  groups  via  an  ‘unpacklnnerGroups  {. . con¬ 
struct.  The  unpacklnnerGroups  block,  similar  to  the  focus  operation  from  [Fahndrich 
and  DeLine  2002],  will  trade  the  permission  to  the  owner  group  of  the  receiver  object 
for  permissions  to  inner  groups  defined  in  the  receiver’s  state.  This  exchange  prohibits 
recursive  method  calls  from  accessing  the  same  inner  groups,  which  would  violate  the 
permission  invariants  (e.g.,  only  one  exclusive  data  group  permission  per  data  group). 
What  happens  is  that  when  unpacklnnerGroups  is  called,  the  exclusive  permission  for 
the  "owner"  is  replaced  by  exclusive  permissions  for  the  inner  data  groups  of  the  re¬ 
ceiver  object  (i.e.,  the  "this"  object).  This  approach  transitively  avoids  the  need  for 
synchronization.  Analogously,  when  the  client  has  either  a  shared  or  protected  per¬ 
mission  to  the  owner  (rather  than  exclusive),  the  owner  permission  is  replaced  by  a 
shared  permission  to  the  inner  groups.  The  unpacklnnerGroups  block  could  automati¬ 
cally  be  inferred  by  the  compiler  (by  simply  determining  which  statements  need  inner 
data  groups  and  wraping  them  in  an  unpacklnnerGroups  block),  but  adding  it  explicitly 
aids  in  documenting  the  programmer’s  intent.  Despite  the  manual  group  permission 
management  ZEminium’s  type  system  guarantees  the  absence  of  race  conditions. 

2.2.2 .  Discussion  and  List  Example.  The  introduction  of  data  groups  and  data  group  per¬ 
missions  allows  programmers  to  introduce  nondeterminism  when  they  need  it,  but 
ensures  that  they  are  explicit  about  where  nondeterminism  is  permitted  and  helps 
them  to  control  the  granularity  of  parallelization,  and  therefore  of  synchronization. 
Nondeterminism  can  only  be  introduced  via  explicit  split  blocks,  and  its  impact  is 
limited  to  accesses  within  that  block.  This  explicitness  helps  ensure  that  program¬ 
mers  have  thought  about  the  semantics  of  their  program  enough  to  avoid  errors  due  to 
unexpected  nondeterminism.  Furthermore,  data  groups  allow  coarse-grained  synchro¬ 
nization  because  an  atomic  block  on  a  data  group  protects  all  the  objects  within  that 
data  group,  eliminating  the  need  to  synchronize  separately  on  each  object.  In  the  case 
of  an  exclusive  group  permission,  no  synchronization  is  needed  at  all. 

To  make  this  more  clear,  consider  the  doubly  linked  list  example  in  Figure  4.  In  line 
5,  the  DoubleLinkedList  state  is  defined  with  group  parameter  data,  using  the  same 
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1  //  gr  :  gp  with 

2  //  gp  E  { exclusive ,  shared} 

3  split  {gr)  { 

4  //  gr  :  gp  with 

5  //  gp  :  shared 

e  atomic  (gr*)  { 

7  //  gr^  :  protected 

8  } 

9  //  gr  :  gp  with 

10  //  gp  :  shared 

n  } 

12  //  gr  :  gp  with 

13  //  gp  E  { exclusive ,  s/iared} 

(a)  Split/Atomic  Block 


exclusive 

split  | 

split  shared 

T 


atomic 


protected 


(b)  Group  Permission  Conversion  Diagram 


Fig.  3:  Group  Permission  Splitting/Joining  via  Shared  and  Atomic  blocks.  The  notation  gr  :  gp 
means  that  we  have  group  permission  gp  for  data  group  gr. 


syntax  as  Java  type  parameters.  The  data  group  parameters  specifies  the  data  group 
to  which  the  objects  stored  in  the  list  belong.  Line  6  defines  a  new  data  group  called 
'internal’.  Line  9  declares  the  'head’  field  pointing  to  the  chain  of  'DoubleLinkedLis- 
tltems’  which  are  all  associated  with  the  'internal’  data  group  of  the  surrounding  'Dou- 
bleLinkedList’.  Because  inner  groups  are  not  visible  outside  the  state  it  is  impossible 
for  these  objects  to  leave  the  scope  of  the  state.  This  strong  encapsulation  resembles 
ownership  types  [Clarke  et  al.  1998],  and  allows  ^EMINIUM  developers  to  incremen¬ 
tally  refine  their  internal  data  structures  to  increase  internal  concurrency  (e.g.,  in  our 
case  study  below,  modifying  a  hash  table  that  uses  one  data  group  for  all  hash  buckets 
to  an  implementation  that  uses  one  data  group  per  hash  bucket). 

Lines  12  and  24  show  the  definitions  of  two  add  functions  that  specify  data  group  pa¬ 
rameters  along  with  their  required  permissions.  The  signature  of  the  two  add  methods 
are  identical,  with  the  exception  that  the  add  method  in  line  12  requires  an  exclusive 
permission  to  the  data  group  that  owns  the  receiver,  while  the  add  method  in  24  re¬ 
quires  a  shared  GP.  The  effect  of  this  difference  can  be  observed  in  the  implementation 
of  the  corresponding  bodies.  In  the  case  of  the  add  method  that  requires  an  exclusive 
permission  to  the  receiver’s  data  group,  the  unpacklnner Groups  can  provide  an  exclu¬ 
sive  permission  to  the  inner  data  groups,  which  in  turn  allows  the  programmer  to  ac¬ 
cess  the  shared  inner  state  without  any  synchronization.  In  the  case  of  the  add  method 
that  requires  a  shared  permission  to  the  receiver’s  data  group,  the  unpacklnner  Groups 
can  only  provide  a  shared  permission  to  the  inner  data  groups,  requiring  the  program¬ 
mer  to  synchronize  on  the  inner  data  group  (line  30). 

Note  that  the  current  design  of  /EMINIUM  only  protects  against  race  conditions  and 
not  against  deadlocks.  The  latter  has  been  handled  in  prior  work  [Boyapati  et  al.  2002], 
which  is  orthogonal  to  our  approach,  and  is  left  out  of  this  discussion  for  simplicity. 

2.3.  Producer/Consumer  Example 

After  the  discussion  of  access  permissions,  data  groups  and  their  relationships  we  now 
present  a  producer/consumer  example  in  /EMINIUM  (see  Figure  5).  The  program  starts 
execution  at  the  global  entry  method  main  (line  19).  When  entering  the  body  it  has 
an  exclusive  permission  to  a  data  group  a.  This  permission  will  first  flow  into  the 
createQueue  method  call  (line  21).  The  exclusive  permission  matches  the  method  per¬ 
mission  requirements  as  specified  in  line  16.  After  the  createQueue  call  returns  the 
exclusive  permission  to  a ,  the  permission  flows  into  the  split  block  at  line  23.  As  previ¬ 
ously  described,  the  split  block  will  replace  the  exclusive  permission  with  one  corre- 
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1  state  DoubleLinkedListltem(data)  { 

2  ...II  standard  double  linked  list  item 

3  } 

4 

5  state  DoubleLinkedList(data)  { 

e  group  (internal)  //  inner  data  group 

7 

8  //  ‘head’  belonging  to  inner  data  group  ‘internal’ 

9  shared  (internal)  DoubleLinkedListltem  (internal,  data)  head; 

10 

11  method  void 

12  add(exclusive  owner,  shared  data)(shared(data)  Object(data)  o) 

13  :  shared  (owner)  //  shared  permission  to  the  receiver 

14  { 

15  //  owner  :  exclusive ,  data  :  shared 

16  unpacklnnerGroups  { 

17  //  internal  :  exclusive ,  data  :  shared 

is  //  access  internal  data  directly 

19  1 

20  //  owner  :  exclusive ,  data  :  shared 

21  } 

22 

23  method  void 

24  add  (shared  owner,  shared  data)  (shared  (data)  Object  (data)  o) 

25  :  shared  (owner)  //  shared  permission  to  the  receiver 

26  { 

27  //  owner  :  shared ,  data  :  shared 

28  unpacklnnerGroups  { 

29  //  internal  :  shared ,  data  :  shared 

30  atomic  (internal)  { 

31  //  internal  :  protected ,  data  :  shared 

32  //  need  protection  to  access  internal  data 

33  } 

34  } 

35  //  owner  :  shared ,  data  :  shared 

36  } 

37 

38  } 


Fig.  4:  A  DoubleLinkedList  with  Data  Groups.  The  example  has  two  add  methods.  The  first  one 
requires  an  exclusive  permission  to  the  owner  and  transitively  provides  an  exclusive  permission 
to  the  inner  groups,  and  does  not  requires  synchronization.  The  second  version  only  requires 
a  shared  permission  to  the  owner  and  only  provides  shared  permissions  to  the  inner  groups, 
requiring  synchronization  i.e.  atomic  blocks.  In  comments  V/’  we  show  which  permissions  we 
currently  hold  via  the  notation  dg  :  gp,  meaning  for  data  group  dg  we  have  permission  gp. 


sponding  shared  permission  for  each  statement  in  its  body.  This  leads  to  the  fact  that 
one  shared  permission  to  a  is  flowing  in  parallel  to  the  producer  and  consumer  method 
calls  (line  24  +  25).  After  those  calls  have  been  completed,  and  therefore  have  returned 
their  shared  permissions  to  a,  the  share  block  will  collect  them  and  join  them  back 
together  to  an  exclusive  permission  (line  26).  This  newly  gained  exclusive  permission 
is  then  fed  to  the  disposeQueue  method  call.  Note  that  if  either  producer  or  consumer 
want  to  access  the  shared  queue,  they  first  have  to  protect  their  access  to  this  data 
group  via  an  atomic  block  (lines  4  and  11).  Figure  6  shows  the  resulting  permission 
flow  and  the  derived  data  flow  graph  for  this  example  program. 
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1  state  ProducerConsumer  { 

2  method  void  producer  (shared  7)(shared(7)  Queue(7)  q)  { 

3  //  a  :  shared 

4  atomic  (7)  { 

5  //  a  :  protected 

6 

7  } 

8  } 

9  method  void  consumer  (shared  7)  (shared  (7)  Queue  (7)  q)  { 

10  //  a  :  shared 

11  atomic  (7)  { 

12  Ha:  protected 

13 

14  } 

15  } 

16  method  shared(7)  Queue (7)  createQueue (exclusive  7) (){...} 

17  method  void  disposeQueue (exclusive  7)(shared(7)  Queue (7)  q){...} 

18 

19  method  void  main  (exclusive  a)()  { 

20  Ha:  exclusive 

21  shared(a:)  Queue(a)  q  =  createQueue  (a) () 

22 

23  split  (a)  { 

24  producer  (a)  (q)  //  a  :  shared 

25  consumer  (a)  (q)  //  a  :  shared 

26  } 

27  Ha:  exclusive 

28  disposeQueue  (a)  (q) 

29  } 

30  } 


Fig.  5:  Producer/Consumer  Example 


:  exclusive 


createQueue 


-.  exclusive 


a  :  shared 


a  :  shared 


a  :  shared 


a  :  shared 


Ira  :  exclusive 


Fig.  6:  Data  Flow  Graph  for  Producer/Consumer  Example 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


A:12 


Stork  et  al. 


1  method  void  exchange  (exclusive  S, 

2  exclusive  I, 

3  exclusive  0)(shared(S)  Socket  s, 

4  shared(I)  Packet  inp, 

5  shared(O)  Packet  outp)  { 

e  receivePacket(S,  I)(s,  inp); 

7  checkPacket(I)(inp); 

s  updatePacket(0)(outp); 

9  sendPacket(S,  0)(s,  outp); 


Fig.  7:  Exchange  Source  Code 


Fig.  8:  Data  Flow  Graph  for  exchange  Function  (for  simplicity  we  show  only  the  flow  of  data 
group  permissions  as  the  access  permissions  do  not  cause  additional  dependencies) 

2.4.  Dataflow  is  not  Fork/Join 

^MINIUM  supports  both  dataflow  and  fork/ join  parallelism.  To  better  understand 
the  difference  between  those  concepts,  consider  the  example  shown  in  Figure  7.  The 
exchange  function,  which  could  be  part  of  a  bi-directional  ring  network  implementa¬ 
tion,  receives  a  new  packet  via  the  provided  socket  s  into  the  Packet  inp.  It  then  checks 
the  newly  received  packet  inp  for  errors  (e.g.,  that  checksums  match).  The  function 
then  updates  the  outgoing  packet  outp  (e.g.,  update  header  fields  and  re-computes 
checksums),  before  this  packet  is  sent  through  the  socket. 

Assuming  that  all  functions  called  in  the  exchange  method  require  exclusive  permis¬ 
sions  to  the  corresponding  data  groups,  the  permission  flow  forms  a  graph  as  shown 
in  Figure  8.  The  graph  shows  that  receiving  the  incoming  packet  can  be  performed  in 
parallel  to  updating  the  outgoing  packet.  As  soon  as  the  incoming  packet  has  been  re¬ 
ceived  the  newly  received  packet  can  be  checked.  When  additionally  the  updates  of  the 
outgoing  packet  have  completed,  the  outgoing  packet  can  be  sent  in  parallel  to  checking 
of  the  incoming  packet.  This  kind  of  parallelism  is  naturally  supported  by  /Eminium’s 
dataflow  approach,  but  cannot  be  directly  expressed  in  a  fork/join  paradigm  unless 
extra  dependencies  or  synchronization  is  used. 

3.  FORMAL  LANGUAGE 

This  section  formalizes  the  object-oriented  /^MINIUM  core  language.  We  briefly  dis¬ 
cuss  the  syntax  of  the  language  and  then  elaborate  on  how  the  static  and  dynamic 
semantics  of  the  calculus  prohibit  race  conditions.  We  conclude  this  section  by  describ- 
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in g  the  soundness  properties  we  have  proved  for  gZ. EMINIUM.  The  goal  of  /^EMINIUM 
is  to  explore  a  simple,  efficient  mechanism  to  track  data  dependencies  via  permission 
flow  and  to  guarantee  the  absence  of  race  conditions.  Because  only  shared  data  can 
lead  to  race  conditions  and  the  tracking  of  object  permissions  and  data  group  permis¬ 
sions  can  be  done  using  similar  mechanisms,  we  focused  the  core  calculus  on  modeling 
data  groups  and  data  group  permissions,  assuming  that  all  data  is  implicitly  shared 
and  omit  immutable  and  unique  permissions  from  our  formal  system  (note  that  our 
implementation  has  support  for  all  discussed  permissions).  /^MINIUM’s  typecheck¬ 
ing  rules  generate  a  data  group  configuration  representing  the  graph  of  dependencies 
between  primitive  expressions  in  the  language;  this  configuration  is  used  along  with 
run-time  permissions  to  model  parallel  execution  in  the  dynamic  semantics. 


3.1.  Syntax 

The  grammar  of  /i^EMINIUM  is  shown  in  Figure  9  and  is  formulated  as  an  extension  to 
Featherweight  Java  (FJ,  [Igarashi  et  al.  2001]).  Our  extensions  are  highlighted  in  red. 

In  a  nutshell  the  major  extensions  to  FJ  are:  i )  addition  of  data  group  parameters  to 
method  calls,  and  class  and  method  declarations;  ii  )  addition  of  group  types,  and  ex¬ 
tension  of  object  types  to  be  parameterized  with  group  parameters;  Hi  )  new  language 
constructs  to  deal  with  data  groups  and  to  support  assignment.  _ 

We  use  the  overbar  notation  to  abbreviate  a  list  of  elements  (e.g.  x  :  T  =  x\  : 
Ti,...,xn  :  Tn).  Unless  otherwise  mentioned  this  notation  includes  the  empty  list. 
We  write  •  to  indicate  the  empty  sequence. 

A  program  consists  of  a  set  of  classes  and  a  main  method.  In  ^l^Eminium  the  global 
starting  expression  of  FJ  is  explicitly  wrapped  in  a  main  method,  to  provide  an  initial 
data  group  for  the  top  level  objects.  A  class  declaration  (CL)  gives  the  class  a  unique 
name  C  and  defines  its  data  group  parameters,  internal  data  groups  (G),  fields  (F)  and 
methods  (M).  Note  that  the  sequence  of  data  group  parameters  may  not  be  empty,  and 
instead  of  having  an  explicit  owner  parameter,  the  first  data  group  parameter  specifies 
the  data  group  to  which  the  class  instances  belong.  /i^EMINIUM  does  not  provide  an 
explicit  constructor.  Upon  creation  of  a  new  object  all  its  fields  are  initialized  to  null 
and  must  later  be  explicitly  set.  Fields  (F)  are  declared  with  a  name  and  type.  Data 
groups  (G)  are  declared  by  name,  which  is  passed  to  the  group  constructor.  Methods 
( M)  specify  their  result  type,  the  data  group  permissions  they  require,  their  formal 
parameters  and  a  body  expression. 

We  syntactically  distinguish  between  expressions  and  possibly  effectful  atoms. 
Atoms  are  straightforward  and  consist  of  field  read  and  assignment,  method  invo¬ 
cation,  and  new  object  creation.  Besides  the  standard  let  binding  (  let  ),  expressions 
consist  of  atomic  blocks  (  atomic  )  which  specify  the  data  group  they  protect  access 
to  and  a  body  expression;  an  operation  that  exchanges  permission  to  the  owner  of  an 
object  for  permission  to  its  inner  data  groups  (  unpackGroupsOf  ),  which  specifies  the 
object  and  an  expression  which  should  gain  access  to  the  inner  groups  of  the  specified 
object  (the  unpacklnnerGroups  of  ^MINIUM  essentially  limits  the  object  reference  to 
the  receiver  object);  and  a  share  primitive  (  split  ),  which  specifies  which  data  groups 
should  be  shared  between  the  two  specified  expressions.  Note  that  the  sequence  of 
data  group  references  in  the  share  construct  must  be  non-empty.  The  inatomic  primi¬ 
tive  (  inatomic  )  does  not  appear  at  the  source  level  and  is  only  used  as  an  intermediate 
form  for  tracking  entered  atomic  blocks.  We  use  a  global  class  table  ( CT )  to  map  class 
names  to  class  declarations. 
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(programs) 

P : 

:=  (CL,  main ) 

(class  decl.) 

CL: 

:=  class  C(a,/3) 

extends  D(oi)  {G  F  M} 

(field  decl.) 

F: 

: =Tf 

(group  decl.) 

G  : 

:=  group  (gn)  _ 

(method  decl.) 

M  : 

:=  Tr  m(gpy)(Tx  x)  {  e  } 

(main  meth.) 

main  : 

:=  C(a)  main(ex elusive  a)()  {  e 

(values) 

v  : 

p=  o  null 

(references) 

r  : 

:=  x  |  v 

(group  ref.) 

gr  : 

:=  r.gn  \  a 

(expressions) 

e  : 

:=  a 

unpackGroupsOf  r  in  e 
let  x  =  e  in  e 
atomic  (gr)  e 

split  (gf)  between  e\  e 2 
inatomic  (gr)  e 

(atoms) 

a  : 

:=  r 

r-f 

r-f  -=r  _ 
r.m(gr)(r ) 
new  C(gr)(r) 


(types) 

T::=C(gr)  |G 

(object) 

II 

O 

II 

0 

(group  perm.) 

gp  ::=  exclusive  \  shared  \  protected 

(group  state) 

S  ::=  U  |  L 

(class  table) 

CT  ::=•  |  CT,  {C  ^  CL) 

C,D,E  e  Classes 
/  g  Fields 
a,  /3, 7  g  Group  Vars 
g  Group  Names 

Fig.  9:  /^EMINIUM  Grammar 


m  e  Methods 
x,  y,  this  G  VARS 

o  g  Obj.  Refs. 


3.2.  Static  Semantics 

This  section  first  provides  an  overview  of  all  definition  forms,  then  discusses  the  de¬ 
tailed  typing  rules.  We  implicitly  assume  that  names  of  fields,  groups  and  methods  in 
a  class  declaration  are  unique. 

3.2.1.  Typing  Context.  The  typing  context  T  contains  all  the  typing  information  for  ob¬ 
ject  references  and  data  group  references.  We  use  G  as  the  type  for  all  data  group 
references. 

(Typing  Context)  V  ::=  •  |  T,r  :  C(gr)  \  T,gr  :  G 


3.2.2.  Permission  Context.  The  permission  context  A  is  a  linear  context  that  keeps  track 
of  the  currently  available  permissions.  We  write  gr  :  gp  to  indicate  that  we  have  group 
permission  gp  for  data  group  gr. 

(Linear  Context)  A  ::=  •  |  A ,gr  :  gp 

3.2.3.  Data  Group  Configuration.  The  data  group  configuration  Q  hierarchically  tracks 
the  data  group  requirements  of  an  expression,  including  any  ordering  or  concurrency 
among  those  requirements.  It  vaguely  resembles  NESL’s  [Blelloch  and  Greiner  1996] 
approach  for  tracking  profiling  information,  but  instead  of  tracking  operation  costs  we 
track  permission  requirements.  A  data-group  configuration  can  either  be  empty  (•); 
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a  collection  of  group  references  ({gf}),  indicating  the  permission  requirements  of  the 
current  expression;  the  sequential  composition  of  data  group  configurations  (®),  used 
to  combine  data  group  configurations  of  expressions  that  are  sequentially  ordered,  or 
the  parallel  composition  of  data  group  configurations  (||),  used  to  combine  data  group 
configurations  of  expressions  that  are  executed  in  parallel.  We  also  define  a  global 
data  group  configuration  table  ( QT )  which  maps  class  and  method  tuples  to  data  group 
configurations. 

(DG  configuration)  0  ::=  •  |  {gf}  I  (0i  ©  G2)  \  ( Gi  ||  G2) 

(0  table)  GT  •  |  GT,  ((C,  m)  i->  Q) 

Example:  Let  us  consider  a  simplified  example  to  provide  an  intuition  for  how  the 
data  group  configuration  is  used  to  control  execution.  Let  us  assume  we  have  a  given 
expression  e  which  represents  a  normal  let  binding  with  a  corresponding  data  group 
configuration  Q.  It  consists  of  the  sequential  composition  of  the  data  group  configura¬ 
tions  of  its  sub-expressions  (i.e.  Q  =  (0i®02)  where  Q\  and  02  are  data  group  configura¬ 
tions  of  subexpressions  e\  and  e2).  Furthermore,  assume  without  loss  of  generally  that 
the  required  data  groups  for  those  sub-expressions  are  requiredPerms{Qi)  =  {gro.gri} 
and  requiredPerms(Ci2)  —  {gro}. 


g  •—  (5i  ©  02) 

requiredPerms{gi)  =  {gro,gri} 

requiredPerms(g2)  =  {gro} 

e  :=  let  x  =  ei 

in  e2 

J 

For  the  moment  consider  the  simple  evaluation  judgment  5\Q  b  e  \->  e!  H  Q' ,  meaning, 
given  the  runtime  permissions  5  and  the  expression  e  with  its  data  configuration  Q , 
the  expression  e  steps  to  a  new  expression  e'  with  its  new  data  group  configuration  Q'. 


{gro,gri }  |  Q  b  let  x=  e\  in  e2  ^  let 


x  =  ei  in  e2  H  Q' 


Q'  \=  {g[  ©02)  requiredPerms(g'1 )  =  {gr  1} 

requiredPerms(g2)  =  {gr  0} 
e!  let  x  =  e'i  in  e2 


The  first  subexpression  e±  requires  all  available  runtime  permissions,  and  because  of 
the  sequential  composition  operator  ©  the  runtime  system  needs  to  satisfy  its  require¬ 
ments  first.  Therefore  there  are  no  runtime  permissions  for  the  second  expressions  e2 
left.  The  system  steps  e\  to  e\  and  updates  its  data  group  configuration  to  g[.  As  shown 
above,  assume  that  with  this  step  all  remaining  operations  in  e[  solely  depend  on  the 
runtime  permission  gr\  indicated  by  requiredPerms(g'1 )  =  {gri}.  In  the  next  execution 
step,  the  runtime  system  again  first  needs  to  satisfy  the  dependencies  of  e\  before  e2. 
But  this  time  e[  does  not  require  all  available  runtime  permissions,  which  allows  the 
system  to  provide  the  remaining  runtime  permissions  to  e2.  This  allows  the  system  to 
step  e[  and  e2  in  parallel  as  shown  below. 


{gfo^gri)  |  G'  b  let  x  =  e^in^e2  let  x  =  e![  in  e'2  H  G' 


G"  :=  (G"®g 2) 

e"  :=  let  x  =  e!{  in  e'2 


3.2.4.  Typing  Judgments.  We  type-check  an  expression  with  the  judgment  T|E|  A  bc  e  : 
T  |  g,  which  reads:  given  the  typing  context  T,  the  store  typing  E,  and  the  permission 
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fields(C)  =  F 
groupDecls(C )  =  gn 

override(C ,  m)  ok 

requiredPerms(Q )  =  gr 
requiredTokens(e )  =  {gr@L} 

mdecl(C,  m)  =  M 
mbody(C ,  m)  =  7.x.e  x  C? 


returns  fields  of  class  C  and  its  superclasses 
returns  the  declared  groups  of  class  C  and  its  super¬ 
classes 

checks  if  a  method  correctly  overrides  another 
method 

returns  the  set  of  all  permissions  in  Q 
return  the  set  of  group  access  tokens  for  which  e  con¬ 
tains  an  corresponding  inatomic  . 
looks  up  the  method  declaration  of  m  in  class  C 
looks  up  the  method  body  of  m  in  class  C,  and  re¬ 
turns  the  body  expression  with  the  method  parame¬ 
ter  names  and  the  data  group  configuration 


Fig.  10:  /^MINIUM  Helper  Functions 


context  A,  the  expression  e  checks  in  the  context  of  class  C  with  type  T  and  has  data 
group  configuration  Q. 

We  use  the  judgment  Tf  f  ok  in  C  to  check  that  the  given  field  declaration  is  valid  in 
class  C.  _ 

We  use  the  judgment  Tr  m(gqrj)(Tx  x)  {  e  }  ok  in  C  to  check  that  the  method  decla¬ 
ration  is  valid  in  class  C. 

3.2.5.  Helper  Functions.  Throughout  the  typing  and  evaluation  rules  we  use  several 
helper  functions  to  abbreviate  common  functionality  For  space  reasons  we  delegate 
the  full  definitions  of  these  functions  to  a  companion  Technical  Report  (submitted  as 
supplementary  material)  and  just  provide  a  short  overview  of  their  effects  in  Figure 
10. 

3.2.6.  Typing  Rules.  The  typing  rules  are  shown  in  Figure  11.  Most  rules  are  straight¬ 
forward;  we  highlight  the  most  interesting  ones.  T-PROGRAM  starts  the  checking  with 
a  top-level  data  group  a.  The  T-UnpackGroupsIn-*  rules  exchange  a  permission  to 
the  data  group  of  an  object  for  a  permission  to  the  inner  groups  of  that  object.  In  the 
case  that  we  have  a  unique  permission  to  the  receiver  object  we  get  exclusive  group 
permissions  (i.e.,  T-UnpackGroupsIn-Exclusive)  in  all  other  cases  we  get  shared  group 
permissions  (i.e.,  T-UnpackGroupsIn-Shared).  We  could  always  unpack  inner  group 
permissions  to  shared  group  permissions,  but  making  the  distinction  allows  us  to  avoid 
unnecessary  synchronization  overhead  in  the  case  we  know  that  we  do  not  need  it 
(i.e.,  in  the  case  of  a  unique  object).  T-SPLIT  splits  the  incoming  permission  context 
in  two,  duplicating  the  named  shared  permissions,  while  T-ATOMIC  allows  the  pro¬ 
tected  expression  to  treat  a  shared  data  group  as  protected .  T-Let  supports  sequential 
composition,  as  specified  by  the  group  configuration  Q\  ®  Q2,  while  T- SHARE  specifies 
parallel  use  of  any  shared  groups,  as  specified  by  the  group  configuration  Q\  ||  Q2.  T- 
FlELD-READ  and  T-FIELD-ASSIGN  require  an  exclusive  or  protected  permission  to  the 
first  data  group  parameter  (gro)  of  the  object  being  read  or  assigned.  This  ensures  that 
either  a  data  group  is  unshared,  or  it  is  locked  with  an  atomic  section  before  being 
used.  Field  reads  and  writes  generate  a  data  group  configuration  that  is  just  the  group 
being  read  or  assigned.  Finally,  T-CALL  ensures  that  the  data  groups  required  by  the 
called  function  are  provided  by  the  caller.  For  a  more  detailed  description  of  each  rule 
cf.  [Stork  et  al.  2012a] 

3.3.  Dynamic  Semantics 

This  section  first  provides  an  overview  of  the  definition  forms  used,  then  discusses  the 
evaluation  rules  in  detail.  Instead  of  generating  an  explicit  dataflow  graph,  the  dy- 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


TEminium:  A  Permission  Based  Concurrent-by-Default  Programming  Language  Approach  A:17 


T-Program 

CL  ok  main  =  C(a)  main(e:r elusive  ck)()  {  e  } 
(a  :  G)|  •  | (a  :  exclusive )  b  e  :  T  \Q 
_ T  < :  C(a) _ 

(CL,  main)  :  C(oc) 


T-Method 

CT(C )  =  class  C(a,(3)  extends  D(a)  {G  FM} 
override(C,  m )  ok 

J^=  (t/us  :  C(a,P),a  :  G,  /3  :  G,  7  :  G) 
r  h  ofc  r,  (x  :  Ta.)|  •  1(7  :  pp)  e  :  Te  |  £ 
_ Te  <  :  Tr _ 

Tr  m(gp  7 )(TX  x)  {e}  ok  in  C 


T-Class_ 

M  ok  in  C  F  ok  in  C 
class  C(a,/3)  extends  D(a)  {G  F  M}  ok 


T-Field  _ 

CT(C )  =  class  C(a,/3)  extends  D(oi)  {G  FM} 
(cTTG,  /3T~G,  this  :  C(aJ),GTG)  f  E(We )  ok 
E(We )  f  ok  in  C 

T-Call  _ 

r  |  E  h  r  :  Tr ,  p  :  Tp ,  gr  :  G 
A  h  gr  :  gp  Tr  =  D(gro) 

CT(D )  =  class  D(a,  (3)  extends  E(oi){G  F  M } 
mdecl(D ,  m )  =  TresuH  m(gpj)(Tx  x){  e  } 

%  <: 

_ Tr  <:  [gr,gri3/^^]^(a,/3) _ 

r|E|A  bc  r.m(gr)(p )  :  [9r’5ro  /__-]Tresuzt  |  {^r} 

T-UnpackGroupsIn-Exclusive 

r|E  b  r  :  C(gr )  A  =  A^  (^rro  :  exclusive) 

groupDecls(C)  =  ~gn 

r,  (r.gn  :  G^EJA7,  (r.gn  :  exclusive)  \~  e  :  T  \  Q 
r|E|A  unpackGroupsOf  r  in  e  :  T  |  ({pro ,  r.grn}  ®  Q) 


T-UnpackGroupsIn-Shared 

r|E  b  r  :  C(gr) 

A  =  A  ,  (pro  :  gp)  gp  G  {shared,  protected} 
groupDecls(C)  =  ~gn 

T,  ( r.gn  :  G^EIA',  ( r.gn  :  shared)  b  e  :  T  \  Q 
r|E|A  \~c  unpackGroupsOf  r  in  e  :  T  \  ({pro ,  r.gn}}  ®  Q) 


T-Split 

{gp}  ^  {exclusive,  shared}  A  =  Ai,  A2,  Ar 
r|E|(Ai,  gr  :  shared)  \~c  e\  :  T\  \Q\ 
r|E|(A2,  gr  :  shared)  \~c  e2  :  T2  \G 2 

_ g  =  (Gi  11  g2) _ 

r|E|(A,  gr  :  gp)  \~c  split  (pr)  between  e\  ||  e2  :  _L  |  G 

T-Atomic 

T|E  b  pr  :  G  r|E|(A,  pr  :  protected)  \~c  e  :  T  \  G 
r|E|A,  (pr  :  shared)  \~c  atomic  (gr)  e  :  T  \  ({pr}  ®  G) 


T-InAtomic 

r|E  b  pr  :  G  r|E|A,  (gr  :  protected)  \~c  e  :  T  \  G 
r|E|A,  (pr  :  shared)  \~c  inatomic  (gr)  e  :  T  \  ({pr}  ®  G) 


T-Let  T-Reference 

r|E|Ai  b  e\  :  Ti  |  G 1  (r,  x  :  T\)  |E|  Ai ,  AR  \~c  e2  :  T2  \  G2  T|E  b  r  :  D(gr ) 

r|E|Ai,  Ar  bc  let  x  =  ei  in  e2  :  T2  \  (Gi  ©  G2)  r|E|A  \~c  r  :  D(gr)  |* 


T-Field-Read 

r|E  b  r  :  D(gr),  pro  :  G 
gp  G  {exclusive,  protected} 

_ fields(D)  =  Tf—f _ 

F|E| A,  (pro  :  gp)  \~c  r.fi  :  Tfi  \  {pr0} 


T-Field-Assign 

r|E  b  rv  :  Tv  ,r:D (pr ) ,  pro  :  G 
gp  G  {exclusive,  protected} 
fields(D)  =  T7  Tv  <:  Tfi 
T|E| A,  (pro  :  gp)  \~c  r.fi  :=  rv  :  Tv  \  {pr0} 


T-New 

CT(D)  =  class  D(a,/3)  extends  E(a){G  F  M} 
T|E  b  pr  :  G 

F|E| A  bc  new  D(gr)()  :  [9r /-  ^\D(a,P)  \  • 


Fig.  11:  Static  Semantics  of  /^MINIUM. 


namic  semantics  uses  the  data  group  configuration  together  with  runtime  permission 
tokens  to  model  the  permission  flow  at  runtime  and  emulate  the  dependencies. 
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3.3. 1.  Store.  The  store  /i  is  a  mapping  of  object  references  o  to  objects  obj.  A  store  can 
either  be  a  potentially  empty  set  of  object  mappings  or  race,  which  indicates  the  case 
that  a  race  condition  occurred  during  the  execution  (our  soundness  theorem  will  show 
that  these  races  cannot  occur  in  well-typed  code).  An  object  is  a  record  consisting  of  all 
instance  fields.  The  inner  groups  (i.e.,  data  groups  that  are  declared  by  every  object) 
along  with  their  corresponding  state  are  managed  separately  in  the  group  access  token 
context  (cf.  Section  3.3.3) 

(store)  fi  ::=  (o  obj)  |  race 

During  the  evaluation  of  an  expression,  differential  stores  (/ is )  containing  the  ac¬ 
cessed  objects  are  generated.  Those  differential  stores  are  merged  via  the  i±J  operator. 
To  generate  a  new  global  heap  we  write  //  =  [ps]p  for  element  wise  update/substitution 
of  objects. 

{/is1 ,  ps2  dom(fxs1 )  fl  dom(/is2 )  =  • 

race  OTHERWISE 


[i  =  [/is]/i 


race  /is  —  race 

[o  i — y  obj]/i  V(o  I— >  obj)  £  /is 


3.3.2.  Runtime  Permission  Context.  The  runtime  permission  context  S  is  used  to  model 
permission  flows  at  runtime  and  is  either  empty  or  consists  of  a  set  of  o.gn  (i.e.  run¬ 
time  permissions).  The  runtime  semantics  do  not  allow  an  expression  to  execute  until 
all  of  its  required  permissions,  as  expressed  in  its  group  configuration,  are  available. 
A  runtime  permission  can  be  split  and  can  flow  along  different  paths,  just  as  static 
permissions  can. 

The  top  level  permission  context  always  contains  only  one  initial  permission  to  the 
global  data  group  of  the  main  function.  More  runtime  permissions  are  successively 
generated  by  unpacking  inner  groups. 

(runtime  permission  context)  S  ::=  «|  8,  o.gn 

3.3.3.  Group  Access  Token  Context.  The  group  token  context  ^  is  a  set  of  group  access  to¬ 
kens,  i.e.,  group  references  along  with  their  current  locking  state  S  =  {U\L}.  A  locking 
state  U  indicates  an  unlocked  state,  meaning  that  one  atomic  block  referring  to  that 
data  group  can  be  entered.  A  locking  state  L  indicates  a  locked  state  meaning  that  an 
atomic  block  referring  to  that  data  group  is  currently  executing.  There  is  a  controver¬ 
sial  discussion  [Boehm  2009]  regarding  the  correct  semantics  for  atomic  blocks.  Some 
argue  that  transactional  semantics  should  be  used  while  others  argue  that  lock-based 
semantics  should  be  used.  We  decided  to  use  a  lock-based  approach  for  its  simplic¬ 
ity  of  implementation  and  semantics.  In  future  we  might  reconsider  this  decision  and 
evaluate  a  transactional  semantics  [Moore  and  Grossman  2008]. 

There  exists  exactly  one  group  access  token  for  every  data  group  in  the  system  and 
unlike  runtime  permissions,  group  access  tokens  cannot  be  split.  In  several  rules  the 
unlocked  group  access  token  context  is  split  in  a  non-deterministic  way.  This  models 
non-determinism  of  how  atomic  blocks  can  lock  data  groups.  Locked  group  access  to¬ 
kens  are  forced  to  flow  into  the  expression  that  contains  the  corresponding  inatomic  . 
This  approach  is  not  strictly  necessary  but  allows  us  to  formulate  a  stronger  preserva¬ 
tion  induction  hypothesis. 

(group  context)  ^  ::=  •\'&,o.gn©S 

3.3.4.  Evaluation  Judgment.  To  evaluate  expressions  we  use  the  judgment  p\S\^t\Q  he^ 
e'  H  /i$\ty'\g',  which  reads  as  follows:  given  the  store  (/ 1 ),  the  runtime  permissions  (£), 
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E-Trans-Z 

]Q\e) 

E-Trans-N 

fASWQ  fjLs\*i\Gi  Mi  -  (^iljT^ilgjJei) 

(/4<5|*|£|e)  ->*  (/iVl^'l^le') 

Fig.  12:  /^MINIUM  Program  State  Transitions  Rules 
E-Field-Read  _  _ 

g  =  {vg.gn}  Vg.gneS  n  h  (v  C[f  =  Vf])  ns  =  (v  i->  C[f  =  vf]) 
lx\5\^\Q  \~  v.fi  i-)-  Vfi  H  /u(5|'3>|* 

E-Field-Assign 

^  =  {^g.^n}  Vg.gn  E  6  n  h  (%,  1— >  objr) 

objr  =  C[/r  =  V  fr  ;  fri  =  Vfi,  fr  =  Vfr]  obj'r  =  C[/r  =  V fr  ,  fri  =  Oy  ,  fr  =  Vfr]  hS  =  (vr  ^  ofrj'r) 

/U.  |  <5 1  \I/ |  C?  I-  Vr.fri  :=  Ov  I — ^  Ov  — I  /X<5  |  ^  |  • 

E-New  _ 

Q  =  •  groupDecls(C)  =  gn  oneu)  fresh  ns  —  {onew  ^  C[/  =  null]) 
g,|5|\I/|£?  h  new  C{v^Tgh)()  1— *  onera  H  |^,  onera .gn@£/|» 

E-Call  _ 

_  Q  =  _  _ 

v^Tgn  E  S  \x  V~  (vr  ^  C[f  =  vfr])  mbody(C,  m)  =  a.x.e  X  ge  g'  —  [vg -gn/^]  [vp  /w]  [Vr /this]Ge 
I-  Vr.m(vg.gn)(v ~)  i->  f9 '®*/a] fp /*] fr / this\e  H  •|^|^' 

Fig.  13:  Dynamic  Semantics  of  yuZEMlNlUM  Atoms 

the  group  access  tokens  (TO,  and  the  data  group  configuration  (£/),  the  expression  e 
steps  to  e'  and  produces  a  differential  store  (/jls),  an  updated  set  of  group  access  tokens 
(T^),  and  an  updated  data  group  configuration  (O'). 

3.3.5.  Program  State.  A  program  state  is  a  quintuple  of  the  form  (/x|<S|Tr|(?|e),  consisting 
of  a  store  (/x),  a  runtime  permission  context  (5),  a  group  access  token  context  (TO  of 
available  tokens,  a  data  group  configuration  ( Q ),  and  an  expression  (e).  A  program 
state  represents  a  consistent  state  of  the  execution.  To  transition  from  one  program 
state  to  another,  the  expression  takes  a  step  following  the  evaluation  judgment  and 
then  generates  a  new  global  store  (see  E-Trans-N  in  Figure  12). 

3.3.6.  Evaluation  Rules.  The  Evaluation  rules  for  atoms  are  shown  in  Figure  13  and  the 
rules  for  expressions  are  shown  in  Figure  14  +  14.  Once  again  we  describe  the  most 
interesting  rules.  E-FlELD-READ  demonstrates  the  basic  approach:  we  look  up  the  per¬ 
missions  required  based  on  the  group  context  Q  (which  was  computed  by  the  typecheck¬ 
ing  rules),  and  the  read  cannot  execute  unless  and  until  the  required  permission  is  in 
the  permission  context  5.  Other  atom  rules  are  similar.  The  E-UnpackGroupsOf-* 
rules  make  the  inner  permissions  available  to  the  enclosed  expression  if  and  only  if  the 
permission  to  the  outer  object  is  available;  otherwise  the  enclosed  expression  can  only 
take  steps  for  which  these  permissions  are  not  required.  There  are  three  variants  of  the 
let  and  share  rules:  one  where  the  first  expression  takes  a  step,  one  where  the  second 
steps,  and  one  where  both  expressions  step  (this  can  occur  even  in  the  sequentializing 
LET  construct  if  the  permissions  required  do  not  overlap).  The  rules  for  split  differ  in 
that  LET  divides  the  permissions  without  duplicating  any,  while  SPLIT  duplicates  the 
permissions  named  in  the  split  block.  Finally,  the  rules  for  the  atomic  block  do  not 
pass  a  permission  to  the  named  data  group  inwards  until  a  lock  is  acquired,  at  which 
point  the  state  of  the  lock  changes  to  and  the  expression  changes  to  inatomic  for 
tracking  purposes.  For  a  more  detailed  description  of  each  rule  cf.  [Stork  et  al.  2012a]. 
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E-UnpackGroupsOf-Replace 

G  =  ({v'  .gn' ,  vr.gn}  ®  Qe) 

S  =  5  ,v  .gn  ,  /i\8  ,  .gn\'f/\Ge  he4e  — I  |  ^  |  £  =  ({v  .gn  ,  .gn}  ®  Ge) 

l±\8\^\G  P  unpackGroupsOf  vr  in  e  i— V  unpackGroupsOf  Dr  in  e  ~\  fi§\^  \Q 

E-UnpackGroupsOf-None 

G  =  {W -gn  ,  vr.gn }  ©  Ge)  v  .gn  £  8  ii\8\^\Qe  P  e  i->>  e  H  /jl$  |\1>' |</'  =  ({V.gn,  ur.gn}  ®  C?' ) 

/u 1 8 \^\G  P  unpackGroupsOf  in  e  i->  unpackGroupsOf  t)r  in  e  H  \G 

E-Let-1 

£  =  (Gi  ®  G2)  5i  =  8  n  requiredPerms(Gi) 

T'  =  ^1,^2  requiredT okens(e\)  C  'Ll 

n\Si\Vi\Gi  P  ei  i->-  e}  H  A^l^il^i 
=  (<?1  ©  G2)  V'  =  U  ^2 
g,|<5|\I/|C/  P  let  x  =  ei  in  e2  1— >•  let  rr  =  e-j^  in  e2  H  /x<5 |(? 

E-Let-2 

£  =  (£1  ©  G2)  82  =  8  —  requiredPerms(Qi) 

T'  =  'Fi,\E'2  requiredT  okens(e\)  C  \I>i 
requiredT  ok  ens(e 2)  C  \J/2 
Ml^2|^2|^2  P  e2  e2  H  /i5|^2l^2 

_ V  =  ViU*'2  G'  =  (Gi@g'2) _ 

H\8\^\G  P  let  x  =  ei  in  e2  i->  |  let  x  =  e\  in  ~\  /jls\^'  \G' 


E-Let-12 

£  =  (£1  ©  £2)  di  =  8  D  requiredPerms(Gi)  82  =  8  —  8 1 
^  =  ^1,^2  requiredT  ok  ens(e±)  C  \I/i  requiredT  ok  ens(e2)  C  mI^i  l^i  |£?i  I-  ei  ^  ei  “I  M5i  l^i  l^i 

/i|(^2  |^2 \G2  P  e2  >-»•  e2  H  fJL$2  1^2  1^2  ^  U  ^2  ^  —  (^1  ©  ^2)  =  A4^  ©  A4^ 

A4|d|'F|C?  P  let  x  =  ei  in  e2  let  x  =  e}  in  e2  H  /x<5 


E-Let- Value 

£  =  (»0g2)  g'  =  r/a?]g2 

)u|<5|'F|C?  P  let  x  =  v  in  e2  r/®]e2  H  •  |1L|£/ 


E-UnpackGroupsOf-Value 

/x|d|'P|C;  h  unpackGroupsOf  in  1;  i->  v  H  »|$|« 


E-Split-1 

g  =  (gi  II  ^2)  di  =  8  Pi  requiredPerms(Gi)  \I>  =  ^1,^2  requiredT  ok  ens(e  1)  C  \I>i 

requiredT okens{e,2 )  C  n\8i  I'Ll  |gi  P  ei  1— )•  H  /X5  |\L}  | G[  V  =  \L}  U  \L2  =  (gx  ||  G2) 

g,\8\^\G  P  split  ( v.gn )  between  e±  ||  e2  1— >  split  (v.gn)  between  e}  ||  e2  H 

E-Split-2 

^  =  (G 1  ||  G2 )  <*>2  =  <5  n  requiredPerms(g2)  SSf  =  ^1,^2  requiredT okens(ei)  C 

requiredT okens{e2)  C  mI^2  1^2 1^2  P  e2  1— >■  H  ^<5 \^'2 =  ^1  U  ^'2  G'  —  (Gi  ||  G'2) 

li\8\*\g  P  split  (v.gn)  between  e±  ||  e2  1— >  split  (v.gn)  between  ei  ||  e2  H  fJ^s  I'L  | G 


E-SPLIT-12 

^  =  (C/i  ||  ^2)  5i  =  5  fl  requiredPerms(Gi )  £2  =  8  n  requiredPerms(G 2) 

^  =  ^1,^2  requiredT  okens(e\)  C  requiredT  ok  ens(e2)  C  m|^i  |^i  |^i  P  ei  1— >■  H  |^^  |^^ 

^1^2 1^2 1^2  P  e2  e2  H  ns2  1^21^2  A4 5  =  A^x  ©  M52  =  'F'1U^2  G'  =  (G[  ||  G'2) 

/i|5|^|^  P  split  (v.gn)  between  ex  ||  e2  1— >•  split  (v.gn)  between  ex  ||  e2  H  /xs\^  \G 


Fig.  14:  Dynamic  Semantics  of  A^EMINIUM  Expressions  [1/2] 


3.4.  Proof 

We  prove  the  correctness  of  our  system  by  induction  on  the  derivation  of  program 
state  transitive  rules  (cf.  Figure  12).  We  prove  the  type  safety  following  the  standard 
approach  [Pierce  2002]  by  proving  progress  and  preservation  separately. 

Our  definition  of  correctness  means  that  every  well  formed  program  is  free  of  data 
races.  As  outlined  in  Section  2.2.2  ^MINIUM  currently  does  not  handle  deadlocks. 
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E-Atomic-Step1 

G  =  ({v.gn}  ©  ge) 

v.gn  £  6  n\8\^\Ge  b  e  i-»>  e  H  Q'e 

_ G'  =  ({v-gn}  0  G'e) _ 

g\S\^/\G  b  atomic  (v.gn)  e  »  atomic  (v.gn)  e  H  fis\^'\G' 

E-Atomic-Step2 

G  =  ({v.gn}  (B  Ge)  S  =  S',  v.gn 
v.gn@U  £  T'  n\5'\V\ge  he^e'H  g,5\^'\G'e 

_ G'  =  ({^.gn}  ©  g'e) _ 

fi\S\^\G  b  atomic  (v.gn)  e  >  atomic  (v.gn)  e!  H  ^ 1 1^/ 

E-Atomic-InAtomic 

£  =  ({v.gn}  ©  g e) 

v.gn  G  <5  T  =  u.<7n@U  v.gn®!/ 

b  atomic  (v.gn)  e  1— >•  inatomic  (v.gn)  e  H  •  |\Ef/|C? 

E-InAtomic-Step 

v.gn  G  <5  \&  =  ,  u.gn®!/ 

g  =  ({u.<?n}  ©  £e)  he^e'H^  |*'"|gi 

=  'L///,  v.gn@L  G'  =  ({v.gn}  ©  G'e) 

/x|d|Tf|C/  b  inatomic  (v.gn)  e  »  inatomic  (v.gn)  e  ~\  fi§ 

E-Split- Value 

_ g  =  (•  II  •) _ 

/i|<s)|Tf|C/  b  split  (v.gn)  between  v|  ||  V2  1— >•  null  H  •|\Er|« 


E-InAtomic-Value 

\I/  =  ,  v.gn@L  v.gn  G  S  —  S&" ,  v.gn@U 

/z|<5|\I/|C/  b  inatomic  (v' .gn)  v  v  ~\  • 


Fig.  15:  Dynamic  Semantics  of  /jA EMINIUM  Expressions  [2/2] 


Therefore  a  correct  /EMINIUM  program,  while  free  of  deadlocks,  might  still  have  po¬ 
tential  deadlocks. 

The  intuitive  idea  behind  the  proof  is  that  to  avoid  race  conditions  at  runtime  our 
type  systems  checks  that  all  accesses  to  shared  data  groups  are  correctly  protected 
using  an  atomic  block.  Accessing  the  same  object  if  the  heap  in  a  conflicting  manner 
would  result  in  a  race  heap.  Our  proof  shoes  that  using  /EMINIUM’s  type  system  no 
such  conflicting  operations  can  occur  at  runtime. 

3.4.1.  Type  Safety.  We  state  type  safety  as  follows:  If  r|£|A  b (p\S\^\Q\e)  and 
(p\8\^\Q  h  e)  (p'\S'\ty'\g'\e')  then  rj£'|A  b (p'\5'\^f'\Q'\e')  and  not  stuck.  In  word 
this  means  that  every  well  formed  (cf.  Definition  3.1)  program  state  can  take  an  arbi¬ 
trary  amount  of  steps  and  will  result  in  another  well-formed  program  state.  We  prove 
this  theorem  through  induction  by  leveraging  our  progress  and  preservation  lemma 
(cf.  Section  3.4.2  and  3.4.3). 

Definition  3.1  (Well-Formed  Program  State).  A  program  state  is  well  typed,  writ¬ 
ten  as  -|E| A  (ii\6\V\g\e),  if : 

—  .|E|Abe:T|0 

—  r|£  h  /i 

—  If  o.gn  e  5  then  there  exists  the  corresponding  o.gn 

—  p  ^  race 

—  ( o.gn@U  £  V  o.gn@_  £  \£)  =^>  $  inatomic  {o.gn) 


:  gpe  A 
...  Ee 
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—  o.gn@L  G  =>  3  exactly  one  inatomic  (o.gn) . . .  G  e 

3.4.2.  Progress.  Our  progress  lemma  is  stated  as  follows: 

LEMMA  3.2  (PROGRESS).  If  rj£|A  b (/x|5|^|^|e)  (Le.  a  well-formed  program 
state)  then  either: 

—  e  is  a  value  and  Q  =  • 

—  /z|5|\I>|(/  h  e  i-G  e'  H  for  some  e',gs,^',G' 

—  e  stops  execution  with  null-dereference ,  meaning  that  the  expression  e  contains  a  sub 
expression  of  the  form  null.f. 

—  eis  waiting  for  resource  to  become  available 

In  other  words,  for  every  well-formed  program  state,  the  expression  e  is  either  a 
value,  or  can  take  a  step  to  e',  caused  a  null  pointer  execption  or  is  waiting  for  be¬ 
ing  able  to  run  (i.e.,  waiting  until  all  the  previous  expressions  it  depends  on  have 
executed).  We  prove  the  correctness  of  our  progress  lemma  through  induction  on 
r|£|A  bc  e  :  T  \G  (cf.  [Stork  2013]). 

3.4.3.  Preservation.  We  state  our  preservation  lemma  as  follows: 

Lemma  3.3  (Preservation).  If  r|£|A  kuf  {g\5\^\G\e)  with  r|£|A  b  e  :  T  \Q  and 
g\5\^\Q  b  e  g  e'  H  gs\^'\G'  and  g!  —  [ps]g  then  there  exists: 

—  £'  D  £ 

—  V 

such  that: 

—  r|£'|A  kvf  (fi'\5\y'\G'\e')  with  r|£'|A  b  e!  :V  \G'  and  V  <:  T 

In  other  words,  if  we  start  with  a  well-formed  program  state  and  the  expression 
e  steps  to  e'  we  end  in  a  well-formed  program  state  again.  We  prove  this  lemma  by 
induction  on  c\Q\e)  \-f  (p\'$f':F\i&'jC\G'\ef)  (cf.  [Stork  2013]). 

4.  IMPLEMENTATION 

Our  implementation  is  based  on  the  Plaid  programming  language  and  is  publicly  avail¬ 
able  in  our  Google  Code  repository  [Stork  et  al.  2012b].  The  overall  system  architecture 
is  shown  in  Figure  16.  The  compiler  user  writes  Plaid  code  and  feeds  it  into  our  com¬ 
piler.  The  compiler  first  translates  the  Plaid  source  code  into  an  Abstract  Syntax  Tree 
(AST).  The  newly  generated  AST  is  then  used  by  the  type  checker  to  check  that  the 
input  program  does  not  violate  Plaid’s  typing  rules.  In  addition  to  typechecking  the 
program,  the  type  checker  also  computes  a  sequential  dependency  graph  based  on  the 
permission  flow.  The  dependency  graph  design  and  optimizations  follow  the  general 
idea  of  Cliff  Click’s  sea  of  nodes  [Click  and  Paleczny  1995]  in  which  he  replaces  the 
AST  representation  with  a  graph  structure.  The  AST  and  the  dependency  graph  is 
then  used  by  the  ZEminiumfier  which  analyses  and  transforms  the  sequential  depen¬ 
dency  graph  into  a  parallel  dependency  graph.  The  parallel  dependency  graph  and  the 
AST  are  then  used  by  by  the  Task  Builder  to  cluster  operations  into  more  coarse  tasks. 
The  generated  task  graph  and  AST  is  used  by  the  Code  Generator  to  generate  the  final 
Java  byte  code. 

The  generated  code  uses  the  Plaid  and  /EMINIUM  runtime  libraries  to  create  and 
manage  objects  and  parallelism.  The  Plaid  runtime  is  responsible  for  managing  states, 
objects  and  Java  interoperability.  The  ^MINIUM  runtime  is  responsible  for  managing 
the  execution  of  the  tasks  generated  by  the  program.  The  following  sections  elaborate 
on  the  extensions  we  made  to  the  Plaid  compiler. 
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4.1.  Plaid  Primer 

This  section  provides  a  short  introduction  to  the  Plaid  programming  language,  ex¬ 
plaining  all  necessary  constructs  required  for  this  paper.  Please  refer  to  the  official 
Plaid  language  specification  [Aldrich  et  al.  2012]  for  a  more  in-depth  overview  of  Plaid. 
By  design,  the  Plaid  language  resembles  the  Java  language  as  much  as  possible.  The 
main  conceptional  difference  between  Plaid  and  Java  is  the  usage  of  states  instead  of 
classes.  Conceptionally,  Plaid  uses  state  abstractions  to  naturally  encode  the  various 
states  an  object  can  be  in  a  direct  and  checkable  way.  We  discuss  state  composition  and 
state  change  semantics  in  [Sunshine  et  al.  2011].  An  overview  of  Plaid’s  type  system  is 
given  in  [Naden  et  al.  2012].  Those  concepts  are  orthogonal  to  /Eminium’s  paralleliza¬ 
tion  approach  and  we  therefore  limit  ourselves  to  a  subset  of  Plaid  which  most  closely 
resembles  normal  Java. 

Listing  1  shows  simple  Counter  code  emphasizing  the  commonalities  with  Java.  In 
line  1  we  define  a  new  state  Object.  States,  similar  to  Java  classes,  consist  of  a  col¬ 
lection  of  fields  and  methods  that  operate  on  those  fields.  Instead  of  using  the  class 
keyword  Plaid  uses  the  state  keyword  to  declare  such  a  collection.  As  in  Java,  we 
call  the  instances  of  states  objects.  Line  2  shows  that  the  Object  state  defines  only 
one  method  called  toString.  Plaid’s  method  declaration  follows  the  same  syntax  as 
a  Java  method  declaration,  with  the  following  exceptions.  All  method  declarations  in 
Plaid  start  with  the  keyword  method  to  indicate  the  start  of  a  new  method  declaration. 
Note  that  Plaid  does  not  support  Java’s  modifiers  (i.e.,  public,  final,  abstract,  etc) 
but  has  its  own  (discussed  later).  After  the  method  keyword  we  have  the  return  type  of 
the  method  followed  by  the  method  name  and  its  parameter  list.  After  the  parameter 
list  we  have  the  so-called  environment  of  the  method  declared  in  square  brackets.  The 
environment  is  an  implicit  parameter  list  specifying  all  the  variables  that  are  implic¬ 
itly  passed  into  the  method  or  are  captured  from  the  enclosing  lexical  environment.  As 
shown  in  the  example,  the  environment  contains  the  declaration  of  the  this  reference. 
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1  state  Object  { 

2  method  immutable  String  toStringO  [  local  immutable  Object  this  ]; 

3  } 

4 

5  state  Counter  case  of  Object  { 

6  var  immutable  Integer  count  =  0; 

7 

8  method  void  inc()  [  unique  Counter  this  ]  { 

9  this,  counter  =  this,  counter  +  1; 

10  } 

11 

12  method  void  dec()  [  unique  Counter  this  ]  { 

13  this,  counter  =  this,  counter  1; 

14  } 

15 

16  method  immutable  Integer  get()  [  local  immutable  Counter  this  ]  { 

17  this. counter 

18  } 

19 

20  method  immutable  String  toStringO  [  local  immutable  Counter  this  ]  { 

21  "CounterC  +  this.count.toStringO  +  ")" 

22  } 

23  } 

Listing  1:  Basic  Plaid  Example 


1  method  immutable  Integer  fibonacci(immutable  Integer  n)  { 

2  match  (  n  <=  2  )  { 

3  case  True  {  1 } 

4  default  { 

5  fibonacci(n— 1)  +  fibonacci(n— 2) 

6  } 

7  } 

8  } 

Listing  2:  Plaid  Fibonacci  Example 


Note  the  additional  local  keyword  in  front  of  the  immutable  permission  of  the  this 
reference,  local  is  a  permission  modifier  that  allows  the  caller  of  a  method  to  recover 
the  permission  passed  in,  without  requiring  the  user  to  worry  about  concrete  fractions 
(cf.  [Naden  et  al.  2012]).  The  this  reference  is  implicitly  passed  into  the  method  and 
therefore  we  need  to  specify  which  permissions  we  need.  After  the  environment  we 
usually  would  declare  the  method  body  in  curly  braces,  but  in  this  case  we  finish  the 
declaration  with  a  semicolon  to  indicate  an  abstract  method  declaration. 

In  line  5  we  define  a  new  state  Counter  as  a  sub-state  of  Object.  Plaid  uses  the  case 
of  instead  of  Java's  extends  to  declare  sub-typing.  The  Counter  defines  a  local  field  in 
line  6.  All  fields  and  variable  declarations  start  with  either  val  (immutable)  or  var  (mu¬ 
table).  In  lines  8,  12  and  8  the  Counter  defines  various  methods  to  increase,  decrease 
or  retrieve  the  current  counter  value.  In  Plaid,  like  in  Smalltalk  [Goldberg  and  Robson 
1983],  everything  is  an  object.  This  means  unlike  in  Java  there  are  no  primitive  types 
(like  int,  boolean,  etc).  The  addition  operation  'this .  count  +  1’  in  line  8  is  translated 
into  a  method  call  with  the  first  operand  as  the  receiver,  i.e.  'this .  count  .  +  (1)’.  This 
is  possible  because  Plaid  supports  methods  named  after  operator  symbols.  Another 
important  observation  is  the  absence  of  the  return  statement  in  Plaid.  Plaid  automat¬ 
ically  returns  the  value  of  the  last  statement  in  a  method  body.  In  line  20  the  Counter 
object  implements  the  abstract  toString  method  as  defined  by  its  super  state. 
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1  immutable  state  Boolean  { ... } 

2 

3  state  True  case  of  Boolean  { ...  } 

4 

5  state  False  case  of  Boolean  { ...  } 

Listing  3:  Plaid  Boolean 


Pattern  matching  is  the  only  control  flow  mechanism  built  into  the  Plaid  program¬ 
ming  language.  The  pattern  matching  in  Plaid  currently  works  on  the  type  level,  and 
does  not  allow  the  automatic  binding  of  internal  fields  to  local  variables.  The  simplest 
way  to  describe  Plaid's  match  statement  is  to  think  of  Java's  switch  statement  com¬ 
bined  with  instanceof  operations  to  test  for  matching  types  instead  of  values.  An  exam¬ 
ple  of  Plaid's  pattern  matching  is  shown  in  Listing  2.  The  example  shows  a  Plaid  im¬ 
plementation  of  the  Fibonacci  number  computation.  The  example  uses  a  global  method 
defined  in  line  1.  Global  methods  in  Plaid  are  like  static  methods  in  Java ,  meaning 
they  can  be  called  without  having  an  object  instance  available.  In  line  2  the  match 
block  starts.  It  will  take  the  result  of  the  expression  n  <=  2  and  checks  which  case 
matches  the  result  type.  The  result  of  the  comparison  is  of  type  Boolean. 

Note  that  in  Plaid  booleans  are  not  part  of  the  language  and  are  implemented  as 
part  of  the  standard  library.  Listing  3  shows  an  abbreviated  version  of  Plaid's  boolean 
declaration.  Line  1  defines  the  top-level  Boolean  type.  Lines  3  and  5  define  two  orthogo¬ 
nal  subtypes,  one  for  true  values  and  one  for  false  values.  The  definition  of  the  Boolean 
state  also  demonstrates  Plaid's  default  permission.  The  state  declaration  is  annotated 
with  an  immutable  permission.  This  allows  the  user  to  omit  the  permission  annotation 
for  the  Boolean  type  and  the  Plaid  compiler  will  automatically  extend  it  with  default 
permission  specified  on  the  state  declaration  (in  this  case  an  immutable  permission). 

Coming  back  to  the  Fibonacci  example  in  Figure  2  line  3  we  define  a  case  to  check 
if  the  value  of  the  comparison  operations  is  of  type  True.  If  so  we  simply  return  the 
constant  value  one.  Line  4  declares  the  default  case,  which  is  used  when  no  other 
case  applies.  In  this  case  we  simply  use  the  recursive  definition  of  fibonacci  numbers 
to  compute  the  result.  Note  that  the  result  of  the  method  body  is  the  value  to  which 
the  last  statement  reduces.  In  this  case,  the  last  statement  is  the  match  block,  which 
evaluates  to  the  value  of  the  executed  case. 

4.2.  Type  Checker  Extensions 

Because  Plaid’s  type  checker  already  had  support  for  access  permissions,  our  first  ex¬ 
tension  was  adding  support  for  data  groups  and  data  group  permissions.  The  over¬ 
all  implementation  of  data  groups  and  permissions  is  straightforward  and  analogous 
to  the  existing  implementation  of  access  permissions  (with  the  exception  that  ac¬ 
cess  permission  are  automatically  split/merged,  while  group  permissions  are  manually 
split/merged).  The  second  extension  we  made  to  the  typechecker  was  the  generation 
of  a  permission  flow  graph.  Because  of  Plaid’s  eager  typechecker  implementation  (i.e., 
access  permissions  are  merged  back  as  soon  as  possible)  the  resulting  permission  flow 
graph  does  not  capture  all  the  possible  parallelism.  Instead  of  reimplementing  Plaid’s 
typechecker  in  a  non-eager  way,  we  decided  to  remove  the  eagerness-induced  sequen¬ 
tiality  via  an  extra  compiler  pass  (cf.  Section  4.3). 

4.3.  /Eminiumfier 

The  /EMINIUM  parallelizing  pass  runs  directly  after  the  typechecking  pass  and  trans¬ 
forms  the  sequential  dependency  graph  inferred  by  the  typechecker  into  a  parallel 
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Name 


Description 


Chained  Splits 
Chained  Joins 
Unique  Join/Split 
Symmetric  Join/Split 


Simplifies  chains  of  split  nodes  introduced  by  binary  per¬ 
mission  split  rules. 

Simplifies  chains  of  split  nodes  introduced  by  binary  per¬ 
mission  split  rules. 

Removes  unnecessary  split/join  operations  which  split  noth¬ 
ing  off  a  unique  permission. 

Transforms  sequential  dependencies  to  symmetric  permis¬ 
sions  into  parallel  dependencies. 


Fig.  17:  Parallelizing  Peephole  Optimizations 


Fig.  18:  Chained  Split  Block  Optimization 


version  by  applying  multiple  peephole  optimizations  [McKeeman  1965].  A  peephole 
optimization  searches  for  specific  patterns  inside  generated  code  (in  our  case  the  'code’ 
is  the  dependency  graph)  and  replaces  those  patterns  by  a  simpler  or  more  efficient 
one.  The  following  sections  explain  each  optimization  and  Figure  17  provides  a  short 
summary. 

4.3.1.  Simplification  of  Chained  Splits.  Typechecking  follows  a  bottom-up  approach.  This 
leads  to  cases  where  multiple  subsequent  permissions  can  be  split  off  the  same  variable 
before  they  get  merged  back.  A  simple  example  of  such  a  case  would  be  typechecking 
a  method  call  where  the  same  variable  is  passed  multiple  times  as  a  parameter  to  the 
call.  This  chaining  of  permission  splits  is  unnecessary  and  can  be  optimized.  Instead 
of  having  a  binary  split  node  and  building  chains  of  them  we  simply  merge  those  nodes 
to  create  one  n-ary  split  node.  Figure  18  illustrates  this  operation.  The  graph  on  top 
shows  a  chain  of  split  nodes  along  with  the  nodes  depending  on  them  (£i, ...,  5n+ 1).  The 
optimization  is  applied  locally  to  individual  nodes.  For  every  node  in  the  graph  the 
algorithm  checks  whether  the  current  node  is  a  split  node.  If  it  is  a  split  node  it  will 
check  if  the  input  permission  is  the  same  as  the  output  permission  and  if  the  current 
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Deleting  a  node  S  from  the  7EMINIUM  de¬ 
pendency  graph  simply  removes  the  node 
from  the  graph  and  makes  all  nodes  which 
dependent  on  him  (J0i, . . . ,  5on)  depend  on 
all  the  nodes  the  removed  node  depended 
on  (6n, . . . ,  5in).  The  algorithm  is  shown  be¬ 
low. 

Fig.  19:  Node  Delete  Operation 


Su 


4  TXT 


Vol 


Fig.  20:  Chained  Join  Block  Optimization 


node  depends  on  another  split  block.  If  all  conditions  hold  the  algorithm  deletes  the 
current  split  block  from  the  graph  while  preserving  its  dependencies  (see  Figure  19). 

4.3.2.  Simplification  of  Chained  Joins.  Similar  to  chained  splits,  the  typechecker  can  gen¬ 
erate  chained  join  nodes  that  merge  the  chained  split  permissions  back  into  the  origi¬ 
nal  permission.  Therefore  the  same  principle  can  be  applied  and  we  can  reduce  these 
chains  to  a  single  join  node.  Figure  20  shows  the  approach  and  the  algorithm.  The  al¬ 
gorithm  operates  on  individual  nodes.  It  first  selects  all  join  nodes.  Then  for  every  join 
node  the  algorithm  checks  whether  the  node  joins  the  input  permission  into  the  same 
kind  of  permission.  If  the  node  does,  does  the  algorithm  checks  if  there  is  any  other 
join  node  depending  on  the  current  node.  If  all  conditions  hold  the  algorithm  deletes 
the  current  node,  again  while  preserving  dependencies. 

4.3.3.  Simplification  of  Unique  Split/Join  Sequences.  The  typechecker  may  sometimes  need 
to  split  off  a  unique  permission  from  a  variable,  leaving  a  none  permission  associated 
with  the  variable.  Later,  when  the  unique  permission  is  returned  to  the  variable,  the 
typechecker  merges  the  incoming  unique  permission  with  the  available  none  permis¬ 
sion.  This  is  a  typical  scenario  for  method  calls  where  the  permission  gets  conceptually 
split  off  from  the  variable  and  later  (after  the  method  call)  merged  back.  Figure  21 
shows  the  scenario  on  the  left  hand  side  where  a  unique  permission  from  a  has  been 
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Fig.  21:  Simplify  Unique  Join/Split  sequences 


Fig.  22:  Symmetric  Join/Split  Optimization 


split  off  to  satisfy  the  operations  52-  Figure  21  also  shows  the  algorithm  to  implement 
this  optimization,  which  simply  removes  those  unnecessary  nodes. 

4.3.4.  Simplification  of  Symmetric  Join/Split  Sequences.  The  current  version  of  the  type- 
checker  implements  a  greedy  approach  for  merging  permissions  back.  For  every  oper¬ 
ation,  the  greedy  approach  splits  off  the  required  permissions  and  joins  them  back  as 
soon  as  they  become  available  again  (i.e.,  the  operation  completes).  This  leads  to  the 
problem  that  if  two  operations  require  a  symmetric  permission  the  typechecker  creates 
unnecessary  dependencies. 

To  solve  this  issue  we  want  to  detect  such  unnecessary  join/split  patterns  and  elimi¬ 
nate  them  such  that  both  operations  can  operate  in  parallel.  Figure  22  shows  how  we 
remove  those  inner  join/split  nodes  and  reorganize  the  graph  so  that  we  initially  split 
multiple  symmetric  permissions  off  the  original  permission  and  execute  the  operations 
in  parallel. 

4.4.  Taskbuilder 

Generating  a  new  task  for  every  node  in  the  dependency  graph  (i.e.,  one  task  per  opera¬ 
tion)  is  prohibitively  expensive  because  the  ratio  of  task  work  to  task  creation  overhead 
is  too  small.  Therefore,  we  developed  the  Task  Builder  Pass ,  which  combines  multiple 
operations  into  bigger  tasks.  Figure  23  shows  the  basic  idea.  The  task  builder  takes 
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(a)  Dependency  Graph 


(b)  Dependency  Graph  with  Task  Overlay 


(c)  Task  Graph 

Fig.  23:  Task  Builder  Approach 
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Name  Description 

Sequentializing  Single  Task  Graphs  Generate  sequential  code  for  methods  which 

have  a  task  graph  of  only  one  node. 

Inlining  Starter  Task  Inline  start  task  into  method  body  code  block 


Inlining  Body  Task 


Inline  body  task  into  the  method  body  code 
block. 


Fig.  24:  Overiew  of  Code  Generation  Optimizations 


as  input  a  dependency  graph  (see  Figure  23(a))  and  then  computes  which  operations 
can  be  mapped  into  the  same  task  without  losing  parallelism.  Figure  23(b)  shows  the 
input  graph  with  the  task  clustering.  The  task  builder  outputs  a  graph  consisting  only 
of  tasks  (see  Figure  23(b)). 

The  general  idea  behind  the  task  builder  is  called  edge  zeroing.  The  task  builder  uses 
a  cost  metric  to  estimate  the  overall  execution  costs  of  a  specific  dependency  graph.  The 
algorithm  then  analyses,  for  every  edge  in  the  dependency  graph,  how  removing  the 
edge  and  merging  the  connecting  nodes  would  affect  the  execution  cost  of  the  whole 
graph.  If  the  execution  cost  does  not  increase,  the  task  builder  removes  the  current 
edge  from  the  graph  and  merges  together  the  nodes  formerly  connected  by  that  edge. 
The  following  sections  explain  the  task  builder  in  more  details. 

Our  task  builder  algorithm  is  based  on  Sarkar’s  Algorithm  (SA,  [Sarkar  1989]).  To 
work  properly,  SA  needs  to  know  the  runtime  costs  for  every  operation  in  the  graph. 
This  cost  can  be  easily  estimated  for  all  operations  except  method  calls.  To  enable 
SA  to  perform  more  aggressive  optimizations,  we  provide  a  simple  categorization  of 
the  methods.  We  differentiate  between  normal  methods  and  cheap  methods.  Cheap 
methods,  defined  via  cheap  annotations  on  their  declarations,  are  relatively  short  in 
their  execution  and  do  not  justify  the  creation  of  parallelism  by  themselves.  We  prefer 
annotations  to  inference  for  modularity  reasons,  but  the  compiler  verifies  that  methods 
annotated  as  cheap  call  only  other  cheap  methods.  Other  static  [Blelloch  and  Greiner 
1996]  or  dynamic  [Acar  et  al.  2011]  approaches  to  determine  runtime  costs  have  been 
proposed  and  are  generally  applicable  to  our  system. 


4.5.  Code  Generator 

While  the  task  builder  tries  to  minimize  the  number  of  tasks,  there  are  still  a  few  opti¬ 
mizations  that  can  be  performed  during  code  generation  to  further  reduce  the  number 
of  created  tasks.  The  following  sections  present  several  optimizations  that  can  help 
in  this  regard  (cf.  Figure  24  for  a  summary  overview).  We  discuss  each  optimization 
separately  to  focus  on  its  core  idea.  We  present  all  optimizations  in  the  context  of 
method  calls,  but  notice  that  the  optimizations  are  also  applicable  to  optimizing  other 
constructs,  such  as  case  statements  in  a  match  block.  To  focus  on  the  optimization  tech¬ 
niques,  and  for  brevity  reasons,  we  use  the  generic  scheduling  algorithm  as  a  basis  for 
our  extensions  when  we  present  those  optimizations. 


Sequentializing  Single  Task  Graphs.  If  the  task  builder  manages  to  reduce  the  task 
graph  of  a  whole  method  body  to  a  single  task,  then  code  generation  will  inline  this 
task.  This  results  in  the  generation  of  a  sequential  method  body,  equivalent  to  the 
sequential  method  body  that  would  have  been  generated  by  the  standard  Plaid  code 
generator. 
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method  PlaidObject  m(. . .)  {  method  PlaidObject  m(. . .)  { 

r@[<5](o;)  I  ^T 


Listing  5:  Single  Task  Function  Code 

Listing  4:  Single  Task  Function  Graph 

Inlining  Body  Tasks.  Because  the  method  always  has  to  wait  for  the  main  body  task 
to  complete,  we  can  inline  this  task  into  the  method  body  and  avoid  the  creation  and 
synchronization  overhead  for  this  task.  Listing  6  shows  our  code  generation  strategy, 
which  is  comprised  of  the  following  steps: 

(!)  Variable  extraction.  No  changes. 

0  Task  Creation.  We  create  all  tasks  except  the  body  task. 

0  Task  scheduling.  No  changes. 

0  Wait  for  dependencies.  Wait  for  all  tasks  the  body  task  depended  on  to  complete. 
@  Execute  body  task.  Execute  the  remaining  operations  of  the  body  task  and  return 
the  value  of  the  last  statement. 


method  PlaidObject  m(. . .)  { 


} 

Listing  6: 

Inline  Body  Task  Graph 


public  PlaidObject  m(.  .  .)  { 

//  create  variables 

®  PlaidObject[]  _  =  new  PlaidObject[ |  {V ar Decl (x)  £  {<5  :  r@[d]  ©)}}  |]; 
//  create  task  objects 

(DVr*  £  (r\BODY_TASK(r)}  :  Task  Tt .  =  new  Task(|DEPS(T;)|)  { 

public  void  run()  { 

IS_CASE_TASK(Yi)  if  (  CASE_MATCH_COND(ri)  )  {  8Ti  } 
-IS_CASE_TASK(ri)  =>-  dT. 

Vr'  £  RDEPS(n)  :  if  (  Tt,\  =  BODY  TASK(r)  && 

Trf  .decDepC ount{)  ——  0  )  { 

schedule(Tr/); 

} 

} 

}; 


//  compute  dependencies  and  schedule  tasks 
©  Vn  E  START_TASKS(r)  :  schedule(Tr . ); 

//  wait  for  dependencies  of  the  body  task  to  finish 
©  Vn  £  DEPS(BODY_TASKS (r))  :  TT..wait(); 


(D  return  <5Body_tasks(>); 


} 


Listing  7:  Inline  Body  Task  Code 


Inlining  Single  Starter  Task.  If  a  task  graph  has  only  one  starter  task,  we  can  inline 
this  task,  similar  to  the  inlining  of  the  body  task. 

(3)  Variable  extraction.  No  changes. 

(2)  Execute  start  task  code.  Execute  the  operations  associated  with  the  start  task 
directly  in  method  body. 

(3)  Task  Creation.  We  create  all  tasks  except  the  start  task. 

0  Task  scheduling.  Schedule  all  start  tasks  which  depends  on  the  original  start 
task. 
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@  Wait  for  body  task.  No  changes. 


method  PlaidObject  m(. . .)  { 


} 

Listing  8: 

Inlining  Start  Task 
Graph 


public  PlaidObject  m(.  .  .)  { 

//  create  variables 

®  PlaidObject[] _ =  new  PlaidObject[|  {VarDecl (x)  G  {<5  :  r@[d]  (a;)}}  |l; 

//  execute  start  task  code 

©  <5sTART_TASK(r) 

//  create  task  objects 

©Vr*  G  (t\START_TASK(t)}  :  Task  Tr .  =  new  Task(|DEPS(T;)  |)  { 

public  void  computeO  { 

IS_CASE_TASK(ri)  =>-  if  (  CASEJ\MTCH_COND(t;)  )  {  ST.  } 

-i IS_C AS E_TAS K (ri )  =>  ST. 

Vr'  G  RDEPS(n)  :  if  (  Tr, . dec DepCountQ  ==  0)  {  schedule(Tr, ); } 

} 

}; 


//  compute  dependencies  and  schedule  tasks 
©  Vr*  G  RDEPS(START_TASKS(r))  :  scheduled . ); 

//  wait  for  dependencies  of  the  body  task  to  finish 
©  return  TBOdy_task(¥)  .wait(); 

} 

Listing  9:  Inlining  Start  Task  Code 


4.5.1.  Dynamic  Load  Balancing.  Despite  the  optimizations  above,  our  system  can  pro¬ 
duce  significant  more  tasks  than  we  have  parallel  execution  units.  To  eliminate  the 
high  costs  of  task  creation  and  scheduling  we  implemented  the  dynamic  load-balancing 
approach  shown  in  Listing  10.  Every  method  that  supports  parallel  execution  first  per¬ 
forms  a  check  whether  we  have  enough  parallelism  (i.e.,  enough  generated  tasks  to 
utilize  the  available  computation  units)  or  not  by  calling  the  PARALLELIZE  method.  If 
this  method  returns  false  it  means  that  we  have  enough  work  and  should  not  gener¬ 
ate  new  work.  In  this  case  we  simply  execute  the  sequential  method  body  instructions. 
If  the  return  value  is  true  we  need  to  generate  more  parallel  work  and  we  execute  the 
parallel  method  body  implementation  as  described  earlier. 

The  PARALLELIZE  method  implementation  checks  whether  there  are  threads  without 
work.  Because  we  call  the  PARALLELIZE  method  on  every  method  invocation,  determin¬ 
ing  all  the  threads’  current  state  is  prohibitive  expensive.  To  overcome  this  problem 
we  guard  the  check  with  a  global  variable  estimating  the  lack  of  parallel  work.  This 
global  variable  is  updated  when  threads  create  new  tasks  and  when  threads  are  run¬ 
ning  out  of  work.  To  further  optimize  runtime  overhead,  all  accesses  to  this  variable 
are  not  synchronized.  The  lack  of  synchronization  obviously  leads  to  race  conditions 
and  lost  updates.  In  the  scheme  we  apply  when  updating  the  variable,  lost  updates 
only  ever  lead  to  the  creation  of  additional  tasks  and  never  to  starving  threads  (refer 
to  our  implementation  for  the  exact  details). 

An  important  observation  is  that  when  we  execute  the  sequential  code  branch  the 
sequentiality  is  only  enforced  for  the  current  method.  If  the  sequential  code  call  a  func¬ 
tion  which  contains  potential  parallel  executions  this  function  will  do  the  same  check 
to  determine  if  it  should  parallelize  the  code  or  not.  This  is  an  important  feature  of  the 
system  as  it  allows  us  to  recover  from  heavily  imbalanced  code  paths.  The  drawback 
of  this  approach  is  that  we  have  to  check  for  parallelization  on  every  method  that  has 
potential  parallelism. 
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public  PlaidObject  m(PlaidObject  pthis,  ...)  { 
if  (  PARALLELIZEO  =«  false  )  { 

...II  sequential  code 
}  else  { 

...II  parallel  code 

} 


Listing  10:  Dynamic  Load  Balancing 


atomic  {  GLOBAL_DATAGROUP.e nterAtomicO; 

}  GLOBAL_DATAGROUP.  leave  AtomicO; 

Listing  11:  Atomic  Block  Translation 


public  PlaidObject  m(PlaidObject  pthis,  ...)  { 
if  (  GLOBAL_DATAGROUP. in Atomic()  )  t 
...II  sequential  code 
}  else  { 

...  H  parallel  code 

} 


} 


Listing  12:  Global  Atomic  Test 


4.5.2.  Atomic  Block  Implementation.  Our  implementation  allows  seamlessly  mixing  code 
with  and  without  data  groups.  If  we  use  code  without  data  groups  we  are  talking 
about  plain  shared  permissions  and  atomic  blocks  without  any  datagroup  parame¬ 
ters.  In  this  datagroup-less  mode  we  implicitly  pass  a  share  datagroup  permission  to 
an  anonymous  global  datagroup  into  every  method.  Figure  11  shows  that  we  simply 
translate  an  atomic  block  into  the  an  enter  Atomic  and  leave  Atomic  method  call  on  the 
corresponding  datagroup.  Once  we  entered  a  global  atomic  block  we  decided  for  sim¬ 
plicity  reasons  to  sequentialize  the  execution  of  its  body.  This  means  that  when  we  call 
a  method  from  inside  a  global  atomic  block  this  methods  needs  to  execute  sequentially 
even  if  it  could  execute  in  parallel.  There  are  two  approaches  to  achieve  this  behavior. 
The  first  option  is  to  have  a  dynamic  check  at  runtime  to  force  sequential  execution. 
The  second  option  is  to  have  two  versions  of  every  method:  one  version  that  is  called  by 
default  and  another  version  that  can  only  be  called  from  inside  an  atomic  block  directly 
or  transitively  (cf.  AtomJava  [Hindman  and  Grossman  2006]).  We  decide  to  go  for  the 
dynamic  approach  because  it  can  be  easily  merged  with  dynamic  load-balancing  and 
avoids  code  explosion.  Listing  12  shows  the  implementation  of  the  global  atomic  block 
sequentializing  check. 

In  the  case  that  we  have  actual  data  groups  we  translate  an  atomic  block  the  same 
way  with  the  exception  of  replacing  the  GL0BAL_DATAGR0UP  with  the  corresponding 
data  groups  specified  by  the  user.  Note  that  we  do  not  have  to  sequentialize  the  ex¬ 
ecution  of  methods  called  from  inside  a  non-global  atomic  block,  as  we  have  explicit 
specified  datagroup  permissions  which  automatically  enforce  sequentialization  where 
necessary. 
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4.6.  Implementation  Reflection 

The  goal  of  our  implementation  was  to  be  as  fast  as  possible.  During  our  initial  ex¬ 
periments  it  became  quite  obvious  that  creating  and  executing  fine-grained  tasks  on  a 
large  scale  was  prohibitively  expensive.  Therefore  our  goal  was  to  eliminate  as  many 
tasks  as  possible.  In  our  experience,  the  load  balancing  method  resulted  in  the  most 
dramatic  reduction  in  number  of  tasks.  The  dynamic  load  balancing  approach  only 
generates  as  many  tasks  as  needed  to  utilize  system  resources.  Despite  this  fact,  all 
the  other  optimizations  we  described  play  an  important  part  in  our  achieved  perfor¬ 
mance  (cf.  Section  5).  Those  optimizations  are  important  as  they  help  to  reduce  the 
number  of  tasks  and  increase  the  overall  task  size  (which  helps  to  counteract  the  tast 
switching  cost).  Like  so  many  other  cases,  it  is  not  a  single  optimization  but  rather  a 
combination  of  several  that  results  in  the  best  possible  performance. 


5.  EVALUATION 

We  evaluated  our  system  by  conducting  several  case  studies  of  which  we  present  only 
a  selection  in  this  section.  The  remaining  case  studies  can  be  found  in  [Stork  2013]. 

Inspired  by  Problem  Based  Benchmark  Suite  (http://www.cs.cmu.edu/  pbbs/)  we  devel¬ 
oped  a  dictionary  benchmark  to  evaluate  the  effectiveness  of  data  groups.  Our  im¬ 
plementation  (http://goo.gl/nzvLd)  is  based  on  a  hash  table  using  separate  chaining  to 
handle  collisions.  We  developed  two  versions,  a  global  version  which  uses  plain  shared 
permissions  for  its  internal  data  structures  and  a  fine  version  in  which  every  bucket 
has  its  own  data  group. 

We  evaluated  two  use  cases,  one  in  which  we  have  a  unique  permission  to  the  dictio¬ 
nary  and  one  in  which  we  have  a  shared  permission.  Our  benchmark  first  inserts  the 
identity  mapping  for  the  numbers  2°  to  216  into  the  dictionary  (initialization).  Then  we 
lookup  every  mapping  to  check  for  correctness  (checking).  We  run  each  benchmark  case 
50  times  on  an  eight  core  SMP  system  (using  Intel  Xeon  X5460  CPUs)  with  16GB  of 
memory  running  Fedora  7  using  the  Java  HotSpot  64-Bit  Server  VM  (build  20.4-b02). 
We  used  a  dictionary  with  64  hash  buckets.  To  avoid  artificial  patterns  we  random¬ 
ized  the  sequence  in  which  the  numbers  are  inserted/checked  with  a  constant  seed  to 
guarantee  reproducibility. 

Figure  25  shows  the  results  of  our  dictionary  benchmark.  The  first  bar  'global/u¬ 
nique’  (15.125)  represents  the  results  of  the  global  dictionary  implementation  with  a 
unique  permission  to  the  dictionary.  The  linearity  of  the  unique  permission  sequen- 
tiallizes  all  insert/check  operations.  In  the  second  bar  'global/shared’  (15.13s)  we  have 
a  shared  permission  to  the  dictionary,  which  allows  us  to  perform  our  operations  in 
parallel.  This  case  performs  no  better  because  each  parallel  operation  must  immedi¬ 
ately  synchronize  on  the  entire  shared  dictionary  structure,  thus  sequentializing  all 
the  accesses.  The  third  bar  ‘fine/unique’  (9.99s)  uses  the  implementation  which  uti¬ 
lizes  data  groups  for  its  internal  representation.  This  scenario  is  faster  than  any  of  the 
cases  using  the  global  implementation,  because  of  the  use  of  fine-grained  data  groups, 
one  for  each  bucket.  The  unique  receiver  permission  allows  us  to  get  exclusive  group 
permissions  to  the  inner  groups  of  the  dictionary.  This  means  we  do  not  require  pro¬ 
tection  to  access  data  within  those  data  groups  and  therefore  we  avoid  unnecessary 
synchronization  operations.  The  last  case  ‘fine/shared’  (2.32s)  also  allows  the  parallel 
execution  of  our  operations.  Because  the  implementation  associates  each  bucket  with 
its  own  data  group,  we  achieve  a  very  fine-grained  protection  mechanism  which  allows 
the  parallel  modification  of  disjoint  parts  of  the  dictionary.  This  results  in  a  speedup  of 
6.5X  compared  to  the  ‘global/shared’  version. 

The  second  case  study  we  present  consists  of  a  web  server  application 
(http://goo.gl/rU3P2).  We  compiled  the  web  server  in  two  ways.  First  we  compiled  it 
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Fig.  25:  Dictionary  Benchmark  Results. 
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Fig.  26:  Webserver  Benchmark  Results. 


as  a  plain  Plaid  program  (resulting  in  a  sequential  program)  and  second  we  com¬ 
piled  it  with  /EMINIUM  enabled.  As  a  control  we  implemented  equivalent  Java  ver¬ 
sions  (sequential  and  parallel).  We  hosted  the  web  server  in  a  quad-core  machine 
( Intel  Core  2  Q6600  with  4GB  of  memory  running  Ubuntu  11.04  and  using  the 
OpenJDK  64-Bit  Server  VM  (build  20.0-bll)),  serving  the  Python  2.7  documentation 
(http://docs.python.org/).  We  mirrored  the  whole  documentation  three  times  to  our  local 
machine  using  the  puf  (http://puf.sourceforge.net/)  tool.  The  puf  tool  uses  up  to  20  con¬ 
nections  to  parallelize  the  file  downloads  and  therefore  allows  us  to  emulate  multiple 
clients. 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


A:36 


Stork  et  al. 


Fig.  27:  Integral  Performance  Graph 


Figure  26  shows  the  average  performance  values  measured.  The  Plaid  version  of  the 
Webserver  is  the  slowest  (49.1s)  followed  by  the  sequential  Java  version  (48.5s).  This 
makes  sense  as  Plaid  is  generally  slower  than  Java.  The  ^MINIUM  compiled  version 
of  the  Webserver  is  the  second  fastest  (37.4s)  version.  It  is  approximately  31%  faster 
than  its  sequentially-compiled  counterpart.  The  reason  for  this  is  that  the  Webserver 
in  the  /EMINIUM  compiled  version  is  able  to  handle  multiple  requests  in  parallel.  This 
allows  the  overlapping  of  communication  and  computation  and  results  in  a  higher 
throughput.  The  manually  parallelized  Java  version  delivered  the  best  performance 
(31.2).  The  performance  difference  between  the  parallelized  Java  and  the  /EMINIUM 
version  is  bigger  compared  to  their  sequential  counterparts.  This  effect  is  caused  by 
the  parallel  execution  and  the  overlap  of  communication  and  computation  which  hides 
the  communication  costs  to  some  degree.  Because  the  communication  effect  is  reduced, 
the  computation  part  gains  relatively  more  weight,  with  the  result  that  lower  base 
performance  of  the  Plaid  programming  language  has  a  greater  impact. 

In  our  Integral  case  study  we  investigated  /EMINIUM’s  capabilities  to  parallelize 
purely  functional,  highly  computation-intensive  problems.  We  developed  a  small  in¬ 
tegral  library  which  computes  the  integral  of  a  user-defined  function.  The  integral 
is  computed  by  subdividing  the  overall  interval  into  infinitesimal  small  intervals  for 
which  we  calculate  the  approximate  area,  and  then  add  up  all  fractions  to  compute  the 
area  of  the  whole  integral.  We  evaluated  the  performance  by  computing  the  integral 
of  the  square  function  (i.e.,  f(x)  =  x 2)  for  the  interval  [0, 1].  We  run  the  sequential 
Plaid  and  parallel  /EMINIUM  version  on  our  eight-core  machine  each  20  times.  The 
average  runtime  and  standard  deviation  of  both  cases  are  shown  in  Figure  27.  The 
Plaid  version  requires  8.9s  while  the  /EMINIUM  version  needs  only  4.2s.  This  results 
in  a  speedup  of  2.1  meaning  that  ^MINIUM  was  able  to  parallelize  the  program  and 
achieve  some  performance  improvements.  But  it  also  means  that  the  /EMINIUM  ver¬ 
sion  was  only  twice  as  fast  on  an  eight  core  machine,  which  would  suggest  a  speedup 
closer  to  eight.  Our  investigation  revealed  that  the  main  source  for  this  poor  perfor¬ 
mance  lays  in  the  Plaid's  object  system.  As  described  previously,  Plaid  does  not  sup¬ 
port  primitive  types  which  means  that  every  value  in  Plaid  is  an  object.  This  means 
that  in  this  computation-heavy  application  we  have  to  create  a  new  object  for  every 
floating  point  value  we  compute.  Our  investigation  showed  that  the  this  particular 
benchmark  allocates  more  than  1.8  billion  (1.8  x  109)  floating  point  objects.  This  means 
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Fig.  28:  Annotation  Overhead  over  Java 


that  overall  performance  of  out  benchmark  is  limited  by  the  throughput  of  the  virtual 
machine  memory  system.  This  result  does  not  invalidate  the  7EMINIUM  approach,  be¬ 
cause  the  problem  is  a  current  limitation  of  the  Plaid  language  implementation  and 
not  of  /Eminium. 

We  evaluated  our  annotation  overhead  by  comparing  our  7EMINIUM  programs  to 
their  equivalent  Java  versions.  We  counted  how  many  lines  of  the  source  code  (SLOC, 
measured  with  wc)  we  had  to  modify  by:  annotating  types  (i.e.,  add  permission  infor¬ 
mation  to  types),  how  often  we  had  to  specify  additional  group  parameters  to  method 
calls  and  how  many  7EMINIUM  specific  operations  we  used  (e.g.,  atomic  blocks).  Fig¬ 
ure  28  shows  the  numbers  for  the  case  studies  we  presented.  The  values  marked  with 
V  are  versions  fully  annotated  and  values  marked  with  ‘f’  are  programs  which  use 
Plaid’s  default  permission  mechanism  which  allows  omitting  the  permission  annota¬ 
tion  by  specifying  a  default  permission  in  the  state  declaration.  This  allows  the  com¬ 
piler  to  automatically  insert  a  default  permission  wherever  the  user  did  not  specify  a 
permission  explicitly  (e.g.,  in  Java  all  strings  are  immutable  by  design  and  therefore 
the  default  permission  for  strings  could  be  immutable,  which  allows  the  user  to  sim¬ 
ply  write  String  instead  of  immutable  String  when  he  specifies  a  string  type).  The 
numbers  show  that  type  annotations  are  the  most  common  source  of  overhead  and 
that  Plaid’s  default  permission  helps  to  reduce  it.  The  second  important  observation 
is  that  the  more  developers  specify,  the  more  performance  the  compiler  can  achieve. 
This  means  users  can  start  with  a  simple  version  of  a  program  and  then  incrementally 
add  more  annotations  to  increase  the  performance.  It  is  worth  pointing  out  that  using 
Plaid’s  default  permission  approach  we  are  able  to  extract  concurrency  in  the  Web¬ 
server  example  without  the  need  for  any  additional  annotations.  Overall  we  achieve 
a  reasonable  7.9%  annotation  overhead  which  is  comparable  to  the  10.7%  reported  by 
DPJ  [Bocchino  et  al.  2009].  Further  improvements  to  our  system  (e.g.,  type  inference) 
should  allows  us  to  further  mitigate  the  programmer’s  burden.  The  reader  should  also 
take  into  account  that  the  access  permission  information  in  Plaid  serves  additional 
purposes  (e.g.,  checking  typestate). 

6.  FUTURE  WORK 

While  our  current  prototype  system  demonstrated  the  potential  of  our  approach,  it 
has  a  few  shortcomings  we  would  like  to  address  in  future  versions.  The  following 
paragraphs  elaborate  the  most  interesting  and  useful  directions  for  future  extensions. 

Permissions  and  data  groups  provide  a  nice  abstraction  for  many  situations  but 
there  are  corner  cases  in  which  they  can  be  cumbersome  or  not  sufficient.  For  instance, 
the  programmer  may  want  to  impose  an  order  on  two  operations,  perhaps  because 
the  operations  have  effects  that  are  not  currently  captured  in  our  permission  system 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


A:38 


Stork  et  al. 


(e.g.  I/O).  In  this  case  the  programmer  would  have  to  write  “ghost  permissions”  that 
represent  the  effect.  Another  situation  would  be  when  the  user  has  to  write  multiple 
versions  of  the  the  same  method  for  different  permission  configurations.  To  solve  these 
issues,  investigation  into  refined  permission  abstractions  is  necessary.  These  new  ab¬ 
stractions  should  not  only  allow  more  fine  control  by  the  programmer,  but  also  allow 
to  the  compiler  to  infer  permissions  and  implementations  when  possible. 

Our  current  implementation  does  not  support  global  state.  While  global  state  is  gen¬ 
erally  considered  a  bad  thing,  there  are  situations  where  it  is  extremely  convenient. 
An  example  may  include  using  I/O  methods  such  as  println\printf  that  rely  on  global 
state  to  access  the  standard  output  device. 

In  the  current  system  we  use  static  costs  for  the  operations  and  method  calls.  We 
already  distinguish  between  cheap  and  heavy  functions  in  order  to  optimize  the  task 
graph.  One  way  to  improve  this  approach  would  be  be  to  use  an  aggressive  static  anal¬ 
ysis  to  try  to  prove  a  bound  on  method  costs.  Another,  and  more  promising,  approach 
would  be  to  have  a  just-in-time  (JIT)  version  of  our  compiler.  This  JIT  would  analyse 
the  cost  of  functions  at  run  time  and  then  optimize  code  depending  on  the  gathered 
profiling  information. 

7.  RELATED  WORK 

Deterministic  Parallel  Java  (DPJ,  [Bocchino  et  al.  2009])  is  a  parallel  programming  lan¬ 
guage  with  deterministic-by-default  semantics.  DPJ  uses  regions  (which  correspond  to 
^EMINIUM’s  data  groups)  to  partition  the  store  and  provides  explicit  fork/join  parel- 
lelism.  DPJ  has  special  language  constructs  (e.g.,  for  loops,  cobegin  blocks,  etc)  which 
allow  parallel  execution  of  statements  that  do  not  interfere  with  each  other.  Code  out¬ 
side  those  constructs  executes  sequentially.  DPJ  recently  added  support  for  race-free 
non-deterministic  parallelism  as  well  [Bocchino  Jr  et  al.  2011]. 

The  most  significant  difference  between  2EMINIUM  and  DPJ  is  that  programmers  in 
^EMINIUM  think  and  write  code  with  permissions  in  mind.  Parallelism  in  ZEMINIUM  is 
implicitly  inferred  based  on  the  permission  flow  of  those  permissions.  Implicit  paral¬ 
lelism  means  that  /EMINIUM  programs  are  not  tied  to  a  particular  amount  or  granular¬ 
ity  of  parallelism  specified  by  the  programmer;  instead,  the  runtime  is  free  to  adapt  to 
the  parallelism  available  in  the  underlying  hardware.  Likewise,  the  runtime  can  par¬ 
allelize  a  library,  or  not,  depending  on  whether  the  client  is  already  taking  advantage 
of  parallel  resources. 

On  a  technical  level,  our  implicit  parallelism  uses  a  dataflow  model,  which  can  in 
some  programs  capture  more  parallelism  than  can  be  expressed  in  DPJ’s  fork/join 
model  (cf.  Section  2.4)  .  This  dataflow  computation  makes  our  formal  system  quite 
different  than  prior  fork/join  or  thread-based  type  systems.  Our  split  block  (developed 
independently  of  DPJ’s  nondeterminism,  see  [Stork  et  al.  2012a])  also  differs  concep¬ 
tually  from  DPJ’s  nondeterministic  parallelism  construct:  it  does  not  specify  that  code 
executes  in  parallel,  but  rather  that  two  blocks  of  data  can  be  accessed  independently 
without  affecting  (high-level)  program  semantics.  Finally,  Plaid’s  permissions  and  data 
groups  are  tied  to  individual  objects,  in  contrast  to  DPJ’s  globally-declared  regions;  our 
design  is  more  object-based,  and  helps  express  idioms  such  as  uniqueness  that  are  not 
supported  in  DPJ. 

Craik  et  al.  [Craik  and  Kelly  2010]  describe  a  system  which  uses  ownership  informa¬ 
tion  to  automatically  parallelize  code  in  a  dataflow  style.  Craik’s  ownership  contexts 
are  similar  to  /EMINIUM’s  data  groups,  but  they  do  not  have  the  concept  of  unique  or 
immutable  permissions.  Their  system  supports  only  deterministic  parallelism.  While 
they  provide  an  argument  for  soundness,  our  formal  model  goes  further  in  incorporat¬ 
ing  a  small-step  operational  semantics  model  of  parallelism  and  a  rigorous  progress/p¬ 
reservation  proof  approach. 
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The  FX  programming  language  [Gifford  and  Lucassen  1986]  uses  an  implicit 
dataflow  approach  similar  to  /EMINIUM.  The  FX  language  classifies  every  expression 
into  one  the  following  four  categories:  producer  (i.e.,  can  read,  write  and  allocate  mem¬ 
ory),  observer  (i.e.,  can  read  and  allocate  memory)  function  (i.e.,  can  allocate  memory) 
and  pure  (i.e.,  side-effects  free).  Based  on  the  effects  of  each  expression  the  system  can 
compute  a  dataflow  graph  based  on  the  interference  on  the  global  heap  and  extract 
concurrency.  Compared  to  ^MINIUM,  FX  only  supports  deterministic  parallelism  and 
computes  interference  using  effects  with  a  global  granularity  rather  than  fine-grained 
data  groups. 

Data-Centric  Synchronization  (DCS)  [Vaziri  et  al.  2010]  is  an  explicitly  parallel  sys¬ 
tem  where  synchronization  is  expressed  by  associating  object  fields  with  atomic  sets. 
Each  method  declares  which  atomic  sets  it  accesses  and  the  run-time  system  inserts 
synchronization  to  ensure  that  no  methods  with  conflicting  atomic  sets  will  be  executed 
at  the  same  time. 

Fortress  [Allen  et  al.  2008]  has  concurrent-by-default  evaluation  semantics  for  some 
language  constructs  (e.g.,  loops).  When  the  programmer  uses  these  constructs,  she  is 
indicating  that  it  is  safe  to  parallelize  execution.  /EMINIUM  takes  this  concurrent-by¬ 
default  principle  and  applies  it  to  the  whole  language,  not  just  a  few  language  con¬ 
structs.  Furthermore  it  provides  a  type  system  for  controlling  parallelism  according 
to  dependencies  which,  in  the  case  of  Fortress,  might  be  missed  by  the  programmer, 
causing  errors. 

/EMINIUM’s  dataflow  parallelism  generalizes  fork-join  parallelism,  which  was  no¬ 
tably  supported  by  Cilk  [Blumofe  et  al.  1995].  Cilk  extends  C  with  three  additional 
keywords  for  explicit  parallelism:  cilk,  spawn  and  sync.  Every  method  annotated  with 
cilk  can  be  asynchronously  spawned-off  with  the  spawn  keyword,  sync  keyword  is  used 
to  wait  for  a  previously  started  asynchronous  task.  /EMINIUM  essentially  attempts  to 
infer  spawn  and  sync  points  based  on  typed  dependencies,  and  can  also  capture  more 
general  dataflow  patterns  of  parallelism. 

Axum  (formerly  known  as  Maestro )  [Microsoft  Corporation  2009]  is  an  actor-based 
programming  language.  Axum  comes  with  several  operators  to  allow  the  explicit  con¬ 
struction  of  data  flow  graphs,  which  can  be  hierarchically  composed.  For  efficiency 
reasons,  Axum  also  provides  domains ,  containers  for  state,  which  allows  associated 
actors  to  access  the  enclosed  state.  Actors  can  either  be  readers  or  writers  of  shared 
state  and  scheduling  will  follow  the  one- writer  or  multiple-reader  model.  Axum  and 
/EMINIUM  share  similar  concepts,  in  particular  the  data  flow  approach,  and  the  use  of 
data  groups/domains  combined  with  explicit  access  specifications. 

Boyapati  [Boyapati  et  al.  2002]  describes  an  explicitly  concurrent  extension  to  Java 
that  associates  each  object  with  an  owner  (related  to  our  data  groups),  and  checks  that 
the  owner  is  locked  before  accessing  the  object.  Deadlocks  are  also  prohibited  via  a  lock 
ordering  protocol. 

Athapascan- 1  [Galilee  et  al.  1998]  is  a  language  that  dynamically  computes  and  uses 
a  data  flow  graph  to  execute  the  code.  In  Athapascan- 1  the  user  writes  tasks  which  can 
be  asynchronously  spawned  off.  Tasks  are  annotated  with  information  about  which 
shared  data  they  access  and  in  which  way.  The  semantics  of  Athapascan- 1  preserves 
the  deterministic  result  of  execution  and  can  roughly  been  seen  as  a  dynamic  version  of 
DPJ.  Compared  to  /EMINIUM,  Athapascan- 1  uses  a  dynamic  approach  while  /EMINIUM 
uses  a  static  approach  for  computing  the  data  flow  graph. 

SharC  [Anderson  et  al.  2008]  is  a  data  race  checker  for  C  programs.  SharC  uses 
lightweight  type  annotation  system  which  bares  some  resemblance  to  /EMINIUM’s  per¬ 
mission  and  data  group  approach.  SharC  has  private  and  readonly  annotations  which 
compare  to  iEMINIUM’s  unique  and  immtable  permissions.  In  SharC,  all  shared  data 
accesses  need  to  be  marked  with  an  locked(lock)  indicating  which  lock  needs  to  be  held 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


A:40 


Stork  et  al. 


before  accessing  the  corresponding  data.  This  resembles  ^MINIUM’s  shared  permis¬ 
sions  associated  with  data  groups.  To  allow  for  more  flexibility,  SharC  uses  on  top  of 
a  static  typesystem  additionally  a  dynamic  runtime  checks.  Unlike  /EMINIUM,  SharC 
is  a  checker  only  and  can  only  check  that  a  user  parallelized  program  is  accessing  its 
state  in  a  safe  manner. 

The  biggest  differentiator  for  /EMINIUM  is  that  while  nearly  all  the  systems  above 
all  have  explicit  parallel  programming  constructs  or  libraries,  in  the  case  of  ^EMINIUM 
code  executes  in  parallel  by  default,  to  the  extent  allowed  by  permission  dependencies. 
Compared  to  the  implicitly  parallel  models  in  FX  and  Craik  et  al.,  /EMINIUM  supports 
a  richer  set  of  permissions  that  enables  expressing  the  programs  from  our  case  studies. 

8.  CONCLUSION 

We  presented  ^MINIUM,  an  automatic  parallelization  methodology  with  type-based 
safe  deterministic  and  non-deterministic  concurrency.  ^MINIUM  uses  the  permis¬ 
sion  flow  and  datagroups  to  automatically  parallelize  code  and  supports  dataflow 
and  fork  / join  parallelism.  We  further  presented  /i^EMINIUM,  a  core  calculus  for  the 
concurrent-by-default  programming  language  /EMINIUM  along  with  its  soundness 
proof.  We  presented  our  initial  prototype  implementation  and  several  case  studies 
showing  the  benefits  and  applicability  of  the  ^MINIUM  concept  to  selected  use  cases. 
The  /EMINIUM  approach  is  modular,  composable,  incremental  and  provably  avoids  race 
conditions.  The  fundamental  concept  of  ^MINIUM  is  generally  applicable  and  not  lim¬ 
ited  to  object  oriented  languages.  With  ^MINIUM  programmers  can  focus  on  the  core 
functionality  of  their  applications  by  shifting  concerns  about  race  conditions  and  par¬ 
allelization  to  ^MINIUM. 

ACKNOWLEDGMENTS 

This  work  was  partially  supported  by  the  Portuguese  Research  Agency  -  FCT,  through  a  scholarship 
(SFRH/BD/33522/2008),  CISUC  (R&D  Unit  326/97)  and  the  CMU|Portugal  program  (R&D  Project  Aemi- 
nium  CMU-PT/SE/0038/2008).  Supporting  work  on  the  Plaid  language  was  funded  through  the  US  NSF 
grant  #CCF-1116907. 

References 

ACAR,  U.  A.,  CHARGUERAUD,  A.,  and  RAINEY,  M.  2011.  Oracle  scheduling:  controlling  granularity  in 
implicitly  parallel  languages.  In  OOPSLA. 

Adve,  S.  V.  AND  BOEHM,  H.-J.  2010.  Memory  models:  a  case  for  rethinking  parallel  languages  and  hard¬ 
ware.  Commun.  ACM  53. 

Aldrich,  J.,  Beckman,  N.  E.,  Bocchino,  R.,  Naden,  K.,  Saini,  D.,  Stork,  S.,  and  Sunshine,  J. 
2012.  The  Plaid  Language:  Typed  Core  Specification.  Tech.  Rep.  CMU-ISR-12-103,  Carnegie  Mellon 
University. 

Aldrich,  J.,  Sunshine,  J.,  Saini,  D.,  and  Sparks,  Z.  2009.  Typestate-oriented  programming.  In  OOP¬ 
SLA. 

Allen,  E.,  Chase,  D.,  Hallett,  J.,  Luchangco,  V.,  Maessen,  J.,  Ryu,  S.,  Steele  Jr,  G.,  and  Tobin- 
Hochstadt,  S.  2008.  The  Fortress  language  specification  version  1.0.  Tech,  rep.,  Sun  Microsystems, 
Inc. 

Anderson,  Z.,  Gay,  D.,  Ennals,  R.,  and  Brewer,  E.  2008.  Share:  checking  data  sharing  strategies 
for  multithreaded  c.  In  Proceedings  of  the  2008  ACM  SIGPLAN  conference  on  Programming  language 
design  and  implementation.  PLDI  ’08.  ACM,  New  York,  NY,  USA,  149-158. 

BECKMAN,  N.  E.,  Bierhoff,  K.,  AND  Aldrich,  J.  2008.  Verifying  correct  usage  of  atomic  blocks  and 
typestate.  In  OOPSLA. 

BLELLOCH,  G.  E.  AND  GREINER,  J.  1996.  A  Provable  Time  and  Space  Efficient  Implementation  of  NESL. 
In  ICFP. 

Blumofe,  R.  D.,  Joerg,  C.  F.,  Kuszmaul,  B.  C.,  Leiserson,  C.  E.,  Randall,  K.  H.,  and  Zhou,  Y. 
1995.  Cilk:  an  efficient  multithreaded  runtime  system.  In  Symposium  on  PPoPP. 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


ZEminium:  A  Permission  Based  Concurrent-by-Default  Programming  Language  Approach  A:41 


Bocchino,  Jr.,  R.  L.,  Adve,  V.  S.,  Dig,  D.,  Adve,  S.  V.,  Heumann,  S.,  Komuravelli,  R.,  Overbey,  J., 
Simmons,  P.,  SUNG,  H.,  AND  VAKILIAN,  M.  2009.  A  type  and  effect  system  for  deterministic  parallel 
Java.  In  OOPLSA. 

Bocchino  Jr,  R.,  Heumann,  S.,  Honarmand,  N.,  Adve,  S.,  Adve,  V.,  Welc,  A.,  and  Shpeisman,  T. 
2011.  Safe  nondeterminism  in  a  deterministic-by-default  parallel  language.  In  POPL. 

BOEHM,  H.-J.  2009.  Transactional  Memory  Should  Be  an  Implementation  Technique,  Not  a  Programming 
Interface.  Tech.  Rep.  HPL-2009-45,  HP  Laboratories. 

BOYAPATI,  C.,  Lee,  R.,  AND  Rinard,  M.  2002.  Ownership  types  for  safe  programming:  preventing  data 
races  and  deadlocks.  In  OOPSLA. 

BOYLAND,  J.  2003.  Checking  Interference  with  Fractional  Permissions.  In  Static  Analysis:  10th  Interna¬ 
tional  Symposium. 

CLARKE,  D.  G.,  Potter,  J.  M.,  AND  Noble,  J.  1998.  Ownership  types  for  flexible  alias  protection.  In 
OOPSLA. 

CLICK,  C.  and  Paleczny,  M.  1995.  A  simple  graph-based  intermediate  representation.  In  Papers  from 
the  1995  ACM  SIGPLAN  workshop  on  Intermediate  representations.  IR  ’95.  ACM,  New  York,  NY,  USA, 
35-49. 

CRAIK,  A.  AND  KELLY,  W.  2010.  Using  Ownership  to  Reason  about  Inherent  Parallelism  in  Object-Oriented 
Programs.  In  Compiler  Construction.  Springer  Berlin  /  Heidelberg. 

FAHNDRICH,  M.  AND  DeLine,  R.  2002.  Adoption  and  focus:  practical  linear  types  for  imperative  program¬ 
ming.  In  PLDI  ’02:  Proceedings  of  the  ACM  SIGPLAN 2002  Conference  on  Programming  language  design 
and  implementation.  Vol.  37.  ACM,  New  York,  NY,  USA,  13-24. 

Galilee,  F.,  Cavalheiro,  G.  G.,  louis  Roch,  J.,  and  Doreille,  M.  1998.  Athapascan-1:  On-Line  Build¬ 
ing  Data  Flow  Graph  in  a  Parallel  Language.  In  Parallel  Architectures  and  Compilation  Techniques. 

GIFFORD,  D.  K.  and  Lucassen,  J.  M.  1986.  Integrating  functional  and  imperative  programming.  In  LFP. 

GIRARD,  J.-Y.  1987.  Linear  logic.  Theoretical  Computer  Science  50,  1. 

GOLDBERG,  A.  AND  Robson,  D.  1983.  Smalltalk-80:  the  language  and  its  implementation.  Addison- Wesley 
Longman  Publishing  Co.,  Inc.,  Boston,  MA,  USA. 

HINDMAN,  B.  and  Grossman,  D.  2006.  Atomicity  via  source-to-source  translation.  In  Proceedings  of  the 
2006  workshop  on  Memory  system  performance  and  correctness.  MSPC  ’06.  ACM,  New  York,  NY,  USA, 
82-91. 

IGARASHI,  A.,  PIERCE,  B.  C.,  and  Wadler,  P.  2001.  Featherweight  Java:  a  minimal  core  calculus  for  Java 
and  GJ.  In  OOPSLA. 

LEINO,  K.  R.  M.  1998.  Data  groups:  specifying  the  modification  of  extended  state.  In  OOPSLA. 

Leino,  K.  R.  M.,  Poetzsch-Heffter,  A.,  and  Zhou,  Y.  2002.  Using  data  groups  to  specify  and  check 
side  effects.  ACM  SIGPLAN  Notices  37,  5,  246-257. 

McKeeman,  W.  M.  1965.  Peephole  optimization.  Commun.  ACM  8,  443-444. 

Microsoft  Corporation  2009.  Axum  Programmer’s  Guide.  Microsoft  Corporation. 
http://msdn.microsoft.com/en-us/devlabs/dd795202.aspx. 

MOGGI,  E.  1991.  Notions  of  computation  and  monads.  Inf.  Comput.  93,  1,  55-92. 

MOORE,  K.  F.  AND  Grossman,  D.  2008.  High-level  small-step  operational  semantics  for  transactions.  In 
POPL. 

Naden,  K.,  Bocchino,  R.,  Aldrich,  J.,  AND  Bierhoff,  K.  2012.  A  type  system  for  borrowing  permis¬ 
sions.  In  Proceedings  of  the  39th  annual  ACM  SIGPLAN -SIG ACT  symposium  on  Principles  of  program¬ 
ming  languages.  POPL  T2.  ACM,  New  York,  NY,  USA,  557-570. 

PIERCE,  B.  C.  2002.  Types  and  programming  languages.  MIT  Press,  Cambridge,  MA,  USA. 

RUMBAUGH,  J.  1975.  A  parallel  asynchronous  computer  architecture  for  data  flow  programs.  Ph.D.  thesis, 
MIT.  MIT-LCS-TR-150. 

SARKAR,  V.  1989.  Partitioning  and  Scheduling  Parallel  Programs  for  Multiprocessors.  MIT  Press,  Cam¬ 
bridge,  MA,  USA. 

Stork,  S.  2013.  /Eminium-  Freeing  Programmers  from  the  Shackles  of  Sequentiality.  Ph.D.  thesis,  School 
of  Computer  Science,  Carnegie  Mellon  University. 

Stork,  S.,  Aldrich,  J.,  and  Marques,  P.  published  July  2010,  revised  February  2012a.  micro-AEmimium 
Language  Specification.  Tech.  Rep.  CMU-ISR-10-125R2,  Carnegie  Mellon  University. 

Stork,  S.,  Marques,  P.,  and  Aldrich,  J.  2009.  Concurrency  by  default:  using  permissions  to  express 
dataflow  in  stateful  programs.  In  Onward! 

Stork,  S.,  Naden,  K.,  and  Sunshine,  J.  2012b.  AEminium  Code  Repository.  http://goo.gl/olbMs. 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


A:42 


Stork  et  al. 


Sunshine,  J.,  Naden,  K.,  Stork,  S.,  Aldrich,  J.,  and  Tanter,  E.  2011.  First-class  state  change  in 
plaid.  In  Proceedings  of  the  2011  ACM  international  conference  on  Object  oriented  programming  systems 
languages  and  applications.  OOPSLA  ’ll.  ACM,  New  York,  NY,  USA,  713-732. 

VAZIRI,  M.,  Tip,  F.,  Dolby,  J.,  AND  Vitek,  J.  2010.  A  Type  System  for  Data-Centric  Synchronization.  In 
OOPSLA. 


ACM  Transactions  on  Programming  Languages  and  Systems,  Vol.  V,  No.  N,  Article  A,  Publication  date:  January  YYYY. 


